Acoustic Modeling with Digital Delay

Delay effects, such as phasing, flanging, chorus, and artificial reverberation, as well as digital waveguide models (Chapter 6), are built using delay lines, simple digital filter sections, and sometimes nonlinear elements and modulation. We will focus on these elements in the simpler context of delay effects before using them for sound synthesis.

Delay Lines

The delay line is an elementary functional unit which models acoustic propagation delay. It is a fundamental building block of both delay-effects processors and digital-waveguide synthesis models. The function of a delay line is to introduce a time delay between its input and output, as shown in Fig.2.1.

Figure 2.1: The $ M$-sample delay line.

Let the input signal be denoted $ x(n),\, n=0,1,2,\ldots$, and let the delay-line length be $ M$ samples. Then the output signal $ y(n)$ is specified by the relation

$\displaystyle y(n) = x(n-M),\quad n=0,1,2,\ldots \protect$ (3.1)

where $ x(n)\isdef 0$ for $ n<0$.

Before the digital era, delay lines were expensive and imprecise in ``analog'' form. For example, ``spring reverberators'' (common in guitar amplifiers) use metal springs as analog delay lines; while adequate for that purpose, they are highly dispersive and prone to noise pick-up. Large delays require prohibitively long springs or coils in analog implementations. In the digital domain, on the other hand, delay by $ N$ samples is trivially implemented, and non-integer delays can be implemented using interpolation techniques, as discussed later in §4.1.

A Software Delay Line

In software, a delay line is often implemented using a circular buffer. Let D denote an array of length $ M$. Then we can implement the $ M$-sample delay line in the C programming language as shown in Fig.2.2.

Figure 2.2: The $ M$-sample delay line.

    /* delayline.c */
    static double D[M];  // initialized to zero
    static long ptr=0;   // read-write offset

    double delayline(double x)
      double y = D[ptr];    // read operation
      D[ptr++] = x;         // write operation
      if (ptr >= M) { ptr -= M; } // wrap ptr
//    ptr %= M; // modulo-operator syntax
      return y;

Delay lines of this type are typically used in digital reverberators and other acoustic simulators involving fixed propagation delays. Later, in Chapter 5, we will consider time-varying delay lengths.

Acoustic Wave Propagation Simulation

Delay lines can be used to simulate acoustic wave propagation. We start with the simplest case of a pure traveling wave, followed by the more general case of spherical waves. We then look at the details of a simple acoustic echo simulation using a delay line to model the difference in time-of-arrival between the direct and reflected signals.

Traveling Waves

In acoustic wave propagation, pure delays can be used to simulate traveling waves. A traveling wave is any kind of wave which propagates in a single direction with negligible change in shape. An important class of traveling waves is ``plane waves'' in air which create ``standing waves'' in rectangular enclosures such as ``shoebox'' shaped concert halls. Also, far away from any acoustic source (where ``far'' is defined as ``many wavelengths''), the direct sound emanating from any source can be well approximated as a plane wave, and thus as a traveling wave.

Another case in which plane waves dominate is the cylindrical bore, such as the bore of a clarinet or the straight tube segments of a trumpet. Additionally, the vocal tract is generally simulated using plane waves, though in this instance there is a higher degree of approximation error.

Transverse and longitudinal waves in a vibrating string, such as on a guitar, are also nearly perfect traveling waves, and they can be simulated to a very high degree of perceptual accuracy by approximating them as ideal, while implementing slight losses and dispersion once per period (i.e., at one particular point along the ``virtual string'').

In a conical bore, we find sections of spherical waves taking the place of plane waves. However, they still ``travel'' like plane waves, and we can still use a delay line to simulate their propagation. The same applies to spherical waves created by a ``point source.'' Spherical waves will be considered on page [*].

Damped Traveling Waves

Figure 2.3: Damped traveling-wave simulator. In addition to propagation delay by $ M$ samples, there is attenuation by $ g^M<1$.

The delay line shown in Fig.2.1 on page [*] can be used to simulate any traveling wave, where the traveling wave must propagate in one direction with a fixed waveshape. If a traveling wave attenuates as it propagates, with the same attenuation factor at each frequency, the attenuation can be simulated by a simple scaling of the delay line output (or input), as shown in Fig.2.3. This is perhaps the simplest example of the important principle of lumping distributed losses at discrete points. That is, it is not necessary to implement a small attenuation $ g$ for each time-step of wave propagation; the same result is obtained at the delay-line output if propagation is ``lossless'' within the delay line, and the total cumulative attenuation $ g^M$ is applied at the output. The input-output simulation is exact, while the signal samples inside the delay line are simulated with a slight gain error. If the internal signals are needed later, they can be tapped out using correcting gains. For example, the signal half way along the delay line can be tapped using a coefficient of $ g^{M/2}$ in order to make it an exact second output. In summary, computational efficiency can often be greatly increased at no cost to accuracy by lumping losses only at the outputs and points of interaction with other simulations.

Modeling traveling-wave attenuation by a scale factor is only exact physically when all frequency components decay at the same rate. For accurate acoustic modeling, it is usually necessary to replace the constant scale factor $ g$ by a digital filter $ G(z)$ which implements frequency-dependent attenuation, as depicted in Fig.2.4. In principle, a linear time-invariant (LTI) filter can provide an independent attenuation factor at each frequency. Section 2.3 addresses this case in more detail. Frequency-dependent damping substitution will be used in artificial reverberation design in §3.7.4.

Figure 2.4: Damped traveling-wave simulator. In principle, the digital filter $ G(z)$ provides an arbitrary attenuation corresponding to one sample of wave propagation at each frequency. Physically, we must have $ \left \vert G(e^{j\omega T})\right \vert\leq 1$ for all $ \omega $.

Dispersive Traveling Waves

In many acoustic systems, such as piano strings9.4.1C.6), wave propagation is also significantly dispersive. A wave-propagation medium is said to be dispersive if the speed of wave propagation is not the same at all frequencies. As a result, a propagating wave shape will ``disperse'' (change shape) as its various frequency components travel at different speeds. Dispersive propagation in one direction can be simulated using a delay line in series with a nonlinear phase filter, as indicated in Fig.2.5. If there is no damping, the filter $ A(z)$ must be all-pass [449], i.e., $ \vert A(e^{j\omega T})\vert=1$ for all $ \omega T\in[-\pi,\pi]$.

Figure 2.5: Dispersive traveling-wave simulator. In principle, the digital filter $ A(z)$ provides an arbitrary nonnegative delay corresponding to one sample of wave propagation at each frequency.

Converting Propagation Distance to Delay Length

We may regard the delay-line memory itself as the fixed ``air'' which propagates sound samples at a fixed speed $ c$ ($ c=345$ meters per second at $ 22$ degrees Celsius and 1 atmosphere). The input signal $ x(n)$ can be associated with a sound source, and the output signal $ y(n)$ (see Fig.2.1 on page [*]) can be associated with the listening point. If the listening point is $ d$ meters away from the source, then the delay line length $ M$ needs to be

$\displaystyle M = \frac{d}{cT} \quad{\hbox{samples}},

where $ T$ denotes the discrete-time sampling interval. In other words, the number of samples delay is the propagation distance $ d$ divided by $ cT$, the distance sound propagates in one sampling interval. In practice, rounding $ M=d/cT$ to the nearest integer causes no audible difference, unless the echo time is so short that the system is not really perceived as an echo (we'll learn about ``comb filters'' in §2.6 below).

Spherical Waves from a Point Source

Acoustic theory tells us that a point source produces a spherical wave in an ideal isotropic (uniform) medium such as air. Furthermore, the sound from any radiating surface can be computed as the sum of spherical wave contributions from each point on the surface (including any relevant reflections). The Huygens-Fresnel principle explains wave propagation itself as the superposition of spherical waves generated at each point along a wavefront (see, e.g., [349, p. 175]). Thus, all linear acoustic wave propagation can be seen as a superposition of spherical traveling waves.

To a good first approximation, wave energy is conserved as it propagates through the air. In a spherical pressure wave of radius $ r$, the energy of the wavefront is spread out over the spherical surface area $ 4\pi r^2$. Therefore, the energy per unit area of an expanding spherical pressure wave decreases as $ 1/r^2$. This is called spherical spreading loss. It is also an example of an inverse square law which is found repeatedly in the physics of conserved quantities in three-dimensional space. Since energy is proportional to amplitude squared, an inverse square law for energy translates to a $ 1/r$ decay law for amplitude.

Figure 2.6: Geometry of wave propagation from a point source $ \mathbf {x}_1$ to a listening point $ \mathbf {x}_2$.

The sound-pressure amplitude of a traveling wave is proportional to the square-root of its energy per unit area. Therefore, in a spherical traveling wave, acoustic amplitude is proportional to $ 1/r$, where $ r$ is the radius of the sphere. In terms of Cartesian coordinates, the amplitude $ p(\mathbf{x}_2)$ at the point $ \mathbf{x}_2 =
(x_2,y_2,z_2)$ due to a point source located at $ \mathbf{x}_1 =
(x_1,y_1,z_1)$ is given by

$\displaystyle p(\mathbf{x}_2) = \frac{p_1}{r_{12}}

where $ p_1$ is defined as the pressure amplitude one radial unit from the point source located at $ \mathbf{x}=\mathbf{x}_1$ (i.e., $ p_1=p(\mathbf{x}_1+\mathbf{e})$ where $ \vert\vert\,\mathbf{e}\,\vert\vert =1$), and $ r_{12}$ denotes the distance from the point $ \mathbf {x}_1$ to $ \mathbf {x}_2$:

$\displaystyle r_{12} \isdef \left\Vert\,\mathbf{x}_2 - \mathbf{x}_1\,\right\Vert
= \sqrt{(x_2-x_1)^2 + (y_2-y_1)^2 + (z_2-z_1)^2}

This geometry is depicted for the 2D case in Fig.2.6.

In summary, every point of a radiating sound source emits spherical traveling waves in all directions which decay as $ 1/r$, where $ r$ is the distance from the source. The amplitude-decay by $ 1/r$ can be considered a consequence of energy conservation for propagating waves. (The energy spreads out over the surface of an expanding sphere.) We often visualize such waves as ``rays'' emanating from the source, and we can simulate them as a delay line along with a $ 1/r$ scaling coefficient (see Fig.2.7). In contrast, since plane waves propagate with no decay at all, each ``ray'' can be considered lossless, and the simulation involves only a delay line with no scale factor, as shown in Fig.2.1 on page [*].

Figure 2.7: Point-to-point spherical wave simulator. In addition to propagation delay, there is attenuation by $ g=1/r$.

Reflection of Spherical or Plane Waves

When a spreading spherical wave reaches a wall or other obstacle, it is either reflected or scattered. A wavefront is reflected when it impinges on a surface which is flat over at least a few wavelengths in each direction.3.1 Reflected wavefronts can be easily mapped using ray tracing, i.e., the reflected ray leaves at an angle to the surface equal to the angle of incidence (``law of reflection''). Wavefront reflection is also called specular reflection, especially when considering light waves.

A wave is scattered when it encounters a surface which has variations on the scale of the spatial wavelength. A scattering reflection is also called a diffuse reflection. As a special case, objects smaller than a wavelength yield a diffuse reflection which approaches a spherical wave as the object approaches zero volume. More generally, each point of a scatterer can be seen as emitting a new spherically spreading wavefront in response to the incoming wave--a decomposition known as Huygen's principle, as mentioned in the previous section. The same process happens in reflection, but the hemispheres emitted by each point of the flat reflecting surface combine to form a more organized wavefront which is the same as the incident wave but traveling in a new direction.

The distinction between specular and diffuse reflections is dependent on frequency. Since sound travels approximately 1 foot per millisecond, a cube 1 foot on each side will ``specularly reflect'' directed ``beams'' of sound energy above $ 1$ KHz, and will ``diffuse'' or scatter sound energy below $ 1$ KHz. A good concert hall, for example, will have plenty of diffusion. As a general rule, reverberation should be diffuse in order to avoid ``standing waves'' (isolated energetic modes). In other words, in reverberation, we wish to spread the sound energy uniformly in both time and space, and we do not want any specific spatial or temporal patterns in the reverberation.

An Acoustic Echo Simulator

An acoustic echo is one of the simplest acoustic modeling problems. Echoes occur when a sound arrives via more than one acoustic propagation path, as shown in Fig.2.8. We may hear a discrete echo, for example, if we clap our hands standing in front of a large flat wall outdoors, such as the side of a building. To be perceived as an echo, however, the reflection must arrive well after the direct signal (or previous echo).

Figure 2.8: Geometry of an acoustic echo caused by ``multipath'' propagation. A direct signal and a ``floor bounce'' are received from the source S at the listening point L.

A common cause of echoes is ``multipath'' wave propagation, as diagrammed in Fig.2.8. The acoustic source is denoted by `S', the listener by `L', and they are at the same height $ h$ meters from a reflecting surface. The direct path is $ d$ meters long, while the length of the single reflection is $ 2r$ meters. These quantities are of course related by the Pythagorean theorem:

$\displaystyle r^2 = h^2 + \left(\frac{d}{2}\right)^2

Figure 2.9: Acoustic echo simulation using a delay line, gain, and summer.

Figure 2.9 illustrates an echo simulator for the case of a direct signal and single echo, as shown in Fig.2.8. It is common practice to pull out and discard any common delay which affects all signals equally, since such a delay does not affect timbre; thus, the direct signal delay is not implemented at all in Fig.2.9. Similarly, it is not necessary to implement the attenuation of the direct signal due to propagation, since it is the relative amplitude of the direct signal and its echoes which affect timbre.

From the geometry in Fig.2.8, we see that the delay-line length in Fig.2.9 should be

$\displaystyle M = \frac{2r-d}{cT}


$\displaystyle g = \frac{1/2r}{1/d} = \frac{d}{2r},

where $ c$ is the speed of sound and $ T$ denotes the sampling interval. We may eliminate $ r$ using the substitution $ r=\sqrt{h^2+(d/2)^2}$, leaving only two independent variables: the height $ h$ of the source above the reflecting surface, and the distance $ d$ between the source and listener:

$\displaystyle g = \frac{1}{\sqrt{1+(2h/d)^2}}

Program for Acoustic Echo Simulation

The following main program (Fig.2.10) simulates a simple acoustic echo using the delayline function in Fig.2.2. It reads a sound file and writes a sound file containing a single, discrete echo at the specified delay. For simplicity, utilities from the free Synthesis Tool Kit (STK) (Version 4.2.x) are used for sound input/output [86].3.2

Figure: Program in C++ for implementing the echo simulator of Fig.2.9 using the free, open-source Synthesis Tool Kit (STK).

/* Acoustic echo simulator, main C++ program.
   Compatible with STK version 4.2.1.
   Usage: main inputsoundfile
   Writes main.wav as output soundfile

#include "FileWvIn.h"  /* STK soundfile input support */
#include "FileWvOut.h" /* STK soundfile output support */

static const int M = 20000; /* echo delay in samples */
static const int g = 0.8;   /* relative gain factor */

#include "delayline.c" /* defined previously */

int main(int argc, char *argv[])
  long i;
  FileWvIn input(argv[1]);  /* read input soundfile */
  FileWvOut output("main"); /* creates main.wav */
  long nsamps = input.getSize();
  for (i=0;i<nsamps+M;i++)   {
    StkFloat insamp = input.tick();
    output.tick(insamp + g * delayline(insamp));

In summary, a delay line simulates the time delay associated with wave propagation in a particular direction. Attenuation (e.g., by $ 1/r$) associated with ray propagation can of course be simulated by multiplying the delay-line output by some constant $ g$.

Lossy Acoustic Propagation

Attenuation of waves by spherical spreading, as described in §2.2.5 above, is not the only source of amplitude decay in a traveling wave. In air, there is always significant additional loss caused by air absorption. Air absorption varies with frequency, with high frequencies usually being more attenuated than low frequencies, as discussed in §B.7.15. Wave propagation in vibrating strings undergoes an analogous absorption loss, as does the propagation of nearly every other kind of wave in the physical world. To simulate such propagation losses, we can use a delay line in series with a nondispersive filter, as illustrated in §2.2.2 above. In practice, the desired attenuation at each frequency becomes the desired magnitude frequency-response of the filter in Fig.2.4, and filter-design software (typically matlab) is used to compute the filter coefficients to approximate the desired frequency response in some optimal way. The phase response may be linear, minimum, or left unconstrained when damping-filter dispersion is not considered harmful. There is typically a frequency-dependent weighting on the approximation error corresponding to audio perceptual importance (e.g., the weighting $ 1/f$ is a simple example that increases accuracy at low frequencies). Some filter-design methods are summarized in §8.6.

Exponentially Decaying Traveling Waves

Let $ g(r,\omega)$ denote the decay factor associated with propagation of a plane wave over distance $ r$ at frequency $ \omega $ rad/sec. For an ideal plane wave, there is no ``spreading loss'' (attenuation by $ 1/r$). Under uniform conditions, the amount of attenuation (in dB) is proportional to the distance traveled; in other words, the attenuation factors for two successive segments of a propagation path are multiplicative:

$\displaystyle g(r_1+r_2,\omega) =

This property implies that $ g$ is an exponential function of distance $ r$.3.3

Frequency-independent air absorption is easily modeled in an acoustic simulation by making the substitution

$\displaystyle z^{-1}\leftarrow gz^{-1}

in the transfer function of the simulating delay line, where $ g$ denotes the attenuation associated with propagation during one sampling period ($ T$ seconds). Thus, to simulate absorption corresponding to an $ M$-sample delay, the difference equation Eq.$ \,$(2.1) on page [*] becomes

$\displaystyle y(n) = g^Mx(n-M),

as depicted in Fig.2.9.

Frequency-Dependent Air-Absorption Filtering

More generally, frequency-dependent air absorption can be modeled using the substitution

$\displaystyle z^{-1}\leftarrow G(z)z^{-1}

where $ G(z)$ denotes the filtering per sample in the propagation medium. Since air absorption cannot amplify a wave at any frequency, we have $ \left \vert G(e^{j\omega T})\right \vert\leq 1$. A lossy delay line for plane-wave simulation is thus described by

$\displaystyle Y(z) = G^M(z) z^{-M}X(z)

in the frequency domain, and

$\displaystyle y(n) = \underbrace{g\ast g\ast \dots \ast g \, \ast }_{\hbox{$M$\ times}} x(n-M)

in the time domain, where `$ \ast $' denotes convolution, and $ g(n)$ is the impulse response of the per-sample loss filter $ G(z)$. The effect of $ G(z)$ on the poles of the system is discussed in §3.7.4.

For spherical waves, the loss due to spherical spreading is of the form

$\displaystyle Y(z) \propto \frac{G^M(z) z^{-M}}{r}X(z)

where $ r$ is the distance from $ X$ to $ Y$. We see that the spherical spreading loss factor is ``hyperbolic'' in the propagation distance $ r$, while air absorption is exponential in $ r$.

Dispersive Traveling Waves

In addition to frequency-dependent attenuation, LTI filters can provide a frequency-dependent delay. This can be used to simulate dispersive wave propagation, as introduced in §2.2.3.


Up to now, we have been concerned with the simulation of traveling waves in linear, time-invariant (LTI) media. The main example considered was wave propagation in air, but waves on vibrating strings behave analogously. We saw that the point-to-point propagation of a traveling plane wave in an LTI medium can be simulated simply using only a delay line and an LTI filter. The delay line simulates propagation delay, while the filter further simulates (1) an independent attenuation factor at each frequency by means of its amplitude response (e.g., to simulate air absorption), and (2) a frequency-dependent propagation speed using its phase response (to simulate dispersion). If there is additionally spherical spreading loss, the amplitude may be further attenuated by $ 1/r$, where $ r$ is the distance from the source. For more details about the acoustics of plane waves and spherical waves, see, e.g., [318,349]. Appendix B contains a bit more about elementary acoustics,

So far we have considered only traveling waves going in one direction. The next simplest case is 1D acoustic systems, such as vibrating strings and acoustic tubes, in which traveling waves may propagate in two directions. Such systems are simulated using a pair of delay lines called a digital waveguide.

Digital Waveguides

A (lossless) digital waveguide is defined as a bidirectional delay line at some wave impedance $ R$ [430,433]. Figure 2.11 illustrates one digital waveguide.

Figure 2.11: A digital waveguide $ N$ samples long at wave-impedance $ R$.

As before, each delay line contains a sampled acoustic traveling wave. However, since we now have a bidirectional delay line, we have two traveling waves, one to the ``left'' and one to the ``right'', say. It has been known since 1747 [100] that the vibration of an ideal string can be described as the sum of two traveling waves going in opposite directions. (See Appendix C for a mathematical derivation of this important fact.) Thus, while a single delay line can model an acoustic plane wave, a bidirectional delay line (a digital waveguide) can model any one-dimensional linear acoustic system such as a violin string, clarinet bore, flute pipe, trumpet-valve pipe, or the like. Of course, in real acoustic strings and bores, the 1D waveguides exhibit some loss and dispersion3.4 so that we will need some filtering in the waveguide to obtain an accurate physical model of such systems. The wave impedance $ R$ (derived in Chapter 6) is needed for connecting digital waveguides to other physical simulations (such as another digital waveguide or finite-difference model).

Physical Outputs

Physical variables (force, pressure, velocity, ...) are obtained by summing traveling-wave components, as shown in Fig.2.12, and more elaborated in Fig.2.13.

Figure 2.12: Extracting a physical signal from a digital waveguide using delay-line taps.

Figure: More detailed diagram of Fig.2.12.

It is important to understand that the two traveling waves in a digital waveguide are now components of a more general acoustic vibration. The physical wave vibration is obtained by summing the left- and right-going traveling waves. A traveling wave by itself in one of the delay lines is no longer regarded as ``physical'' unless the signal in the opposite-going delay line is zero. Traveling waves are efficient for simulation, but they are not easily estimated from real-world measurements [476], except when the traveling-wave component in one direction can be arranged to be zero.

Note that traveling-wave components are not necessarily unique. For example, we can add a constant to the right-going wave and subtract the same constant from the left-going wave without altering the (physical) sum [263]. However, as derived in Appendix CC.3.6), 1D traveling-wave components are uniquely specified by two linearly independent physical variables along the waveguide, such as position and velocity (vibrating strings) or pressure and velocity (acoustic tubes).

Physical Inputs

A digital waveguide input signal corresponds to a disturbance of the 1D propagation medium. For example, a vibrating string is plucked or bowed by such an external disturbance. The result of the disturbance is wave propagation to the left and right of the input point. By physical symmetry, the amplitude of the left- and right-going propagating disturbances will normally be equal.3.5 If the disturbance superimposes with the waves already passing through at that point (an idealized case), then it is purely an additive input, as shown in Fig.2.14.

Figure 2.14: Summing a signal into a digital waveguide corresponding to a superimposing disturbance at one point. The original state is unaffected, i.e., the input signal enters the waveguide in superposition with whatever is already going on.

Note that the superimposing input of Fig.2.14 is the graph-theoretic transpose of the ideal output shown in Fig.2.13. In other words, the superimposing input injects by means of two transposed taps. Transposed taps are discussed further in §2.5.2 below.

In practical reality, physical driving inputs do not merely superimpose with the current state of the driven system. Instead, there is normally some amount of interaction with the current system state (when it is nonzero), as discussed further in the next section. Note that there are similarly no ideal outputs as depicted in Fig.2.13. Real physical ouputs must present some kind of load on the system (energy must be extracted). Superimposing inputs and non-loading outputs are ideals that are often approximated in real-world systems. Of course, in the virtual world, they are no problem at all--in fact, they are usually easier to implement, and more efficient.

Interacting Physical Input

Figure 2.15 shows the general case of an input signal that interacts with the state of the system at one point along the waveguide. Since the interaction is physical, it only depends on the ``incoming state'' (traveling-wave components) and the driving input signal.

Figure 2.15: Driving a digital waveguide with physical interaction between the driving input signal and the current state of the waveguide.

A less general but commonly encountered case is shown in Fig.2.16. This case requires the ``outgoing disturbance'' to be distributed equally to the left and right, and it sums with the incoming waves to produce the outgoing waves.

Figure 2.16: Symmetric outgoing disturbance in superposition with incoming waves.

Figure 2.17 shows a further reduction in generality--also commonly encountered--in which the interaction depends only on the amplitude of the simulated physical variable (such as string velocity or displacement). The incoming amplitude is formed as the sum of the incoming traveling-wave components. We will encounter examples of this nature in later chapters (such as Chapter 9). It provides realistic models of physical excitations such as a guitar plectra, violin bows, and woodwind reeds.

Figure: Same as Fig.2.15 for the case in which the interaction depends only upon incoming amplitude.

If an output signal is desired at this precise point, it can be computed as the incoming amplitude plus twice the outgoing disturbance signal (equivalent to summing the inputs of the two outgoing delay lines).

Note that the above examples all involve waveguide excitation at a single spatial point. While this can give a sufficiently good approximation to physical reality in many applications, one should also consider excitations that are spread out over multiple spatial samples (even just two).

We will develop the topic of digital waveguide modeling more systematically in Chapter 6 and Appendix C, among other places in this book. This section is intended only as a high-level preview and overview. For the next several chapters, we will restrict attention to normal signal processing structures in which signals may have physical units (such as acoustic pressure), and delay lines hold sampled acoustic waves propagating in one direction, but successive processing blocks do not ``load each other down'' or connect ``bidirectionally'' (as every truly physical interaction must, by Newton's third law3.6). Thus, when one processing block feeds a signal to a next block, an ``ideal output'' drives an ``ideal input''. This is typical in digital signal processing: Loading effects and return waves3.7 are neglected.3.8

Tapped Delay Line (TDL)

A tapped delay line (TDL) is a delay line with at least one ``tap''. A delay-line tap extracts a signal output from somewhere within the delay line, optionally scales it, and usually sums with other taps for form an output signal. A tap may be interpolating or non-interpolating. A non-interpolating tap extracts the signal at some fixed integer delay relative to the input. Thus, a tap implements a shorter delay line within a larger one, as shown in Fig.2.18.

Figure 2.18: A delay line with one tap.

Tapped delay lines efficiently simulate multiple echoes from the same source signal. As a result, they are extensively used in the field of artificial reverberation.

Example Tapped Delay Line

An example of a TDL with two internal taps is shown in Fig.2.19. The total delay line length is $ M_3$ samples, and the internal taps are located at delays of $ M_1$ and $ M_2$ samples, respectively. The output signal is a linear combination of the input signal $ x(n)$, the delay-line output $ x(n-M_3)$, and the two tap signals $ x(n-M_1)$ and $ x(n-M_2)$.

Figure 2.19: Tapped Delay Line (TDL).

The difference equation of the TDL in Fig.2.19 is, by inspection,

$\displaystyle y(n) = b_0 x(n) + b_{M_1} x(n-M_1) + b_{M_2} x(n-M_2) + b_{M_3} x(n-M_3)

corresponding to the transfer function

$\displaystyle H(z) = b_0 + b_{M_1} z^{-M_1} + b_{M_2} z^{-M_2} + b_{M_3} z^{-M_3}

Transposed Tapped Delay Line

Figure 2.20: Transposed Tapped Delay Line (TTDL).

In many applications, the transpose of a tapped delay line is desired, as shown in Fig.2.20, which is the transpose of the tapped delay line shown in Fig.2.19. A transposed TDL is obtained from a normal TDL by formal transposition of the system diagram. The transposition operation is also called flow-graph reversal [333, pp. 153-155]. A flow-graph is transposed by reversing all signal paths, which necessitates signal branchpoints becoming sums, and sums becoming branchpoints. For single-input, single-output systems, the transfer function is the same, but the input and output are interchanged. This ``flow-graph reversal theorem'' derives from Mason's gain formula for signal flow graphs. Transposition is used to convert direct-forms I and II of a digital filter to direct-forms III and IV, respectively [333].

TDL for Parallel Processing

When multiplies and additions can be performed in parallel, the computational complexity of a tapped delay line is $ {\cal O}(1)$ multiplies and $ {\cal O}(\lg(K))$ additions, where $ K$ is the number of taps. This computational complexity is achieved by arranging the additions into a binary tree, as shown in Fig.2.21 for the case $ K=4$.

Figure 2.21: An example Tapped Delay Line (TDL), with additions organized into a binary tree for maximized parallel computation.

General Causal FIR Filters

The most general case--a TDL having a tap after every delay element--is the general causal Finite Impulse Response (FIR) filter, shown in Fig.2.22. It is restricted to be causal because the output $ y(n)$ may not depend on ``future'' inputs $ x(n+1)$, $ x(n+2)$, etc. The FIR filter is also called a transversal filter. FIR filters are described in greater detail in [449].

Figure 2.22: The general, causal, finite-impulse-response (FIR) digital filter.

The difference equation for the $ M$th-order FIR filter in Fig.2.22 is, by inspection,

$\displaystyle y(n) = b_0 x(n) + b_1 x(n-1) + b_2 x(n-2) + b_3 x(n-3) + \cdots + b_M x(n-M)

and the transfer function is

$\displaystyle H(z) = b_0 + b_1 z^{-1} + b_2 z^{-2} + b_3 z^{-3} + \cdots + b_M z^{-M}
= \sum_{m=0}^M b_m z^{-m} \isdef B(z).

The STK class for implementing arbitrary direct-form FIR filters is called Fir. (There is also a class for IIR filters named Iir.) In Matlab and Octave, the built-in function filter is normally used.

Comb Filters

Comb filters are basic building blocks for digital audio effects. The acoustic echo simulation in Fig.2.9 is one instance of a comb filter. This section presents the two basic comb-filter types, feedforward and feedback, and gives a frequency-response analysis.

Feedforward Comb Filters

The feedforward comb filter is shown in Fig.2.23. The direct signal ``feeds forward'' around the delay line. The output is a linear combination of the direct and delayed signal.

Figure 2.23: The feedforward comb filter.

The ``difference equation'' [449] for the feedforward comb filter is

$\displaystyle y(n) = b_0 x(n) + b_M x(n-M). \protect$ (3.2)

We see that the feedforward comb filter is a particular type of FIR filter. It is also a special case of a TDL.

Note that the feedforward comb filter can implement the echo simulator of Fig.2.9 by setting $ b_0=1$ and $ b_M=g$. Thus, it is is a computational physical model of a single discrete echo. This is one of the simplest examples of acoustic modeling using signal processing elements. The feedforward comb filter models the superposition of a ``direct signal'' $ b_0 x(n)$ plus an attenuated, delayed signal $ b_M x(n-M)$, where the attenuation (by $ \vert b_M\vert<1$) is due to ``air absorption'' and/or spherical spreading losses, and the delay is due to acoustic propagation over the distance $ cMT$ meters, where $ T$ is the sampling period in seconds, and $ c$ is sound speed. In cases where the simulated propagation delay needs to be more accurate than the nearest integer number of samples $ M$, some kind of delay-line interpolation needs to be used (the subject of §4.1). Similarly, when air absorption needs to be simulated more accurately, the constant attenuation factor $ b_M$ can be replaced by a linear, time-invariant filter $ G(z)$ giving a different attenuation at every frequency. Due to the physics of air absorption, $ G(z)$ is generally lowpass in character [349, p. 560], [47,318].

Feedback Comb Filters

The feedback comb filter uses feedback instead of a feedforward signal, as shown in Fig.2.24 (drawn in ``direct form 2'' [449]).

Figure 2.24: The feedback comb filter.

A difference equation describing the feedback comb filter can be written in ``direct form 1'' [449] as3.9

$\displaystyle y(n) = b_0 x(n) - a_M y(n-M).

The feedback comb filter is a special case of an Infinite Impulse Response (IIR) (``recursive'') digital filter, since there is feedback from the delayed output to the input [449]. The feedback comb filter can be regarded as a computational physical model of a series of echoes, exponentially decaying and uniformly spaced in time. For example, the special case

$\displaystyle y(n) = x(n) + g\, y(n-M)

is a computational model of an ideal plane wave bouncing back and forth between two parallel walls; in such a model, $ g$ represents the total round-trip attenuation (two wall-to-wall traversals, including two reflections).

For stability, the feedback coefficient $ a_M$ must be less than $ 1$ in magnitude, i.e., $ \left\vert a_M\right\vert<1$. Otherwise, if $ \left\vert a_M\right\vert>1$, each echo will be louder than the previous echo, producing a never-ending, growing series of echoes.

Sometimes the output signal is taken from the end of the delay line instead of the beginning, in which case the difference equation becomes

$\displaystyle y(n) = b_M x(n-M) - a_M y(n-M) .

This choice of output merely delays the output signal by $ M$ samples.

Feedforward Comb Filter Amplitude Response

Comb filters get their name from the ``comb-like'' appearance of their amplitude response (gain versus frequency), as shown in Figures 2.25, 2.26, and 2.27. For a review of frequency-domain analysis of digital filters, see, e.g., [449].

Figure: Amplitude responses of the feed forward comb-filter $ H(z) = x(n) + g x(n-M)$ (diagrammed in Fig.2.23) with $ M=5$ and $ g=0.1$, $ 0.5$, and $ 0.9$. a) Linear amplitude scale. b) Decibel scale. The frequency axis goes from 0 to the sampling rate (instead of only half the sampling rate, which is more typical for real filters) in order to display the fact that the number of notches is exactly $ M=5$ (as opposed to ``$ 2.5$'').
\includegraphics[width=\twidth ]{eps/ffcfar}

The transfer function of the feedforward comb filter Eq.$ \,$(2.2) is

$\displaystyle H(z) = b_0+b_M\,z^{-M},$ (3.3)

so that the amplitude response (gain versus frequency) is

$\displaystyle G(\omega) \isdef \left\vert H(e^{j\omega})\right\vert = \left\vert b_0 + b_M e^{-j\omega M}\right\vert, \quad -\pi \leq \omega \leq \pi. \protect$ (3.4)

This is plotted in Fig.2.25 for $ M=5$, $ b_0=1$, and $ b_M=0.1$, $ 0.5$, and $ 0.9$. When $ b_0=b_M=1$, we get the simplified result

$\displaystyle G(\omega) = \left\vert 1 + e^{-j\omega M}\right\vert
= \left\vert... M/2}\right\vert
= 2\left\vert\cos\left(\omega\frac{M}{2}\right)\right\vert.

In this case, we obtain $ M$ nulls, which are points (frequencies) of zero gain in the amplitude response. Note that in flangers, these nulls are moved slowly over time by modulating the delay length $ M$. Doing this smoothly requires interpolated delay lines (see Chapter 4 and Chapter 5).

Feedback Comb Filter Amplitude Response

Figure 2.26 shows a family of feedback-comb-filter amplitude responses, obtained using a selection of feedback coefficients.

Figure: Amplitude response of the feedback comb-filter $ H(z) = 1/(1-g z^{-M})$ (Fig.2.24 with $ b_0=1$ and $ -a_M=g$) with $ M=5$ and $ g=0.1$, $ 0.5$, and $ 0.9$. a) Linear amplitude scale. b) Decibel scale.
\includegraphics[width=\twidth ]{eps/fbcfar}

Figure 2.27 shows a similar family obtained using negated feedback coefficients; the opposite sign of the feedback exchanges the peaks and valleys in the amplitude response.

Figure: Amplitude response of the phase-inverted feedback comb-filter, i.e., as in Fig.2.26 with negated $ g=-0.1$, $ -0.5$, and $ -0.9$. a) Linear amplitude scale. b) Decibel scale.
\includegraphics[width=\twidth ]{eps/fbcfiar}

As introduced in §2.6.2 above, a class of feedback comb filters can be defined as any difference equation of the form

$\displaystyle y(n) = x(n) + g\,y(n-M).

Taking the z transform of both sides and solving for $ H(z)\isdef Y(z)/X(z)$, the transfer function of the feedback comb filter is found to be

$\displaystyle H(z) = \frac{1}{1-g\,z^{-M}}, \protect$ (3.5)

so that the amplitude response is

$\displaystyle G(\omega) \isdef \left\vert H(e^{j\omega})\right\vert = \frac{1}{\left\vert 1 - g e^{-j\omega M}\right\vert}, \quad
-\pi \leq \omega \leq \pi .

This is plotted in Fig.2.26 for $ M=5$ and $ g=0.1$, $ 0.5$, and $ 0.9$. Figure 2.27 shows the same case but with the feedback sign-inverted.

For $ g=1$, the feedback-comb amplitude response reduces to

$\displaystyle G(\omega) = \frac{1}{2\left\vert\sin(\omega M/2)\right\vert},

and for $ g=-1$ to

$\displaystyle G(\omega) = \frac{1}{2\left\vert\cos(\omega M/2)\right\vert},

which exactly inverts the amplitude response of the feedforward comb filter with gain $ g=1$ (Eq.$ \,$(2.4)).

Note that $ g>0$ produces resonant peaks at

$\displaystyle \omega_k = 2\pi\frac{k}{M}, \quad k=0,1,2,\dots,M-1,

while for $ g<0$, the peaks occur midway between these values.

Filtered-Feedback Comb Filters

The filtered-feedback comb filter (FFBCF) uses filtered feedback instead of just a feedback gain.

Denoting the feedback-filter transfer function by $ H_l(z)$, the transfer function of the filtered-feedback comb filter can be written as

$\displaystyle H(z) = \frac{b_0}{1 - H_l(z)z^{-M}}.

Note that when $ H_l(z)$ is a causal filter, the FFBCF can be considered mathematically a special case of the general allpole transfer function in which the first $ M-1$ denominator coefficients are constrained to be zero:

$\displaystyle H(z) = \frac{b_0}{1 + 0 + \cdots + 0
+ a_M z^{-M} + a_{M+1} z^{-{M+1}} + \cdots}

It is this ``sparseness'' of the filter coefficients that makes the FFBCF more computationally efficient than other, more general-purpose, IIR filter structures.

In §2.6.2 above, we mentioned the physical interpretation of a feedback-comb-filter as simulating a plane-wave bouncing back and forth between two walls. Inserting a lowpass filter in the feedback loop further simulates frequency dependent losses incurred during a propagation round-trip, as naturally occurs in real rooms.

The main physical sources of plane-wave attenuation are air absorptionB.7.15) and the coefficient of absorption at each wall [349]. Additional ``losses'' for plane waves in real rooms occur due to scattering. (The plane wave hits something other than a wall and reflects off in many different directions.) A particular scatterer used in concert halls is textured wall surfaces. In ray-tracing simulations, reflections from such walls are typically modeled as having a specular and diffuse component. Generally speaking, wavelengths that are large compared with the ``grain size'' of the wall texture reflect specularly (with some attenuation due to any wall motion), while wavelengths on the order of or smaller than the texture grain size are scattered in various directions, contributing to the diffuse component of reflection.

The filtered-feedback comb filter has many applications in computer music. It was evidently first suggested for artificial reverberation by Schroeder [412, p. 223], and first implemented by Moorer [314]. (Reverberation applications are discussed further in §3.6.) In the physical interpretation [428,207] of the Karplus-Strong algorithm [236,233], the FFBCF can be regarded as a transfer-function physical-model of a vibrating string. In digital waveguide modeling of string and wind instruments, FFBCFs are typically derived routinely as a computationally optimized equivalent forms based on some initial waveguide model developed in terms of bidirectional delay-lines (``digital waveguides'') (see §6.10.1 for an example).

For stability, the amplitude-response of the feedback-filter $ H_l(z)$ must be less than $ 1$ in magnitude at all frequencies, i.e., $ \left\vert H_l(e^{j\omega T})\right\vert<1,\,\forall\omega T\in[-\pi,\pi)$.

Equivalence of Parallel Combs to TDLs

It is easy to show that the TDL of Fig.2.19 is equivalent to a parallel combination of three feedforward comb filters, each as in Fig.2.23. To see this, we simply add the three comb-filter transfer functions of Eq.$ \,$(2.3) and equate coefficients:

H(z) &=& \left(1+g_1 z^{-M_1}\right) +
\left(1+g_2 z^{-M_2}\...
...\right) \\
&=& 3 + g_1 z^{-M_1} + g_2 z^{-M_2} + g_3 z^{-M_3}

which implies

$\displaystyle b_0 = 3,\; b_{M_1} = g_1,\; b_{M_2} = g_2,\; b_{M_3} = g_3 .

We see that parallel comb filters require more delay memory ( $ M_1+M_2+M_3$ elements) than the corresponding TDL, which only requires $ \max(M_1,M_2,M_3)$ elements.

Equivalence of Series Combs to TDLs

It is also straightforward to show that a series combination of feedforward comb filters produces a sparsely tapped delay line as well. Considering the case of two sections, we have

H(z) &=& \left(1+g_1 z^{-M_1}\right) \left(1+g_2 z^{-M_2}\right)\\
&=& 1 + g_1 z^{-M_1} + g_2 z^{-M_2} + g_1 g_2 z^{-(M_1+M_2)}

which yields

$\displaystyle b_0 = 1,\; b_{M_1} = g_1,\; b_{M_2} = g_2,\; M_3=M_1+M_2,\;b_{M_3}=g_1 g_2.

Thus, the TDL of Fig.2.19 is equivalent also to the series combination of two feedforward comb filters. Note that the same TDL structure results irrespective of the series ordering of the component comb filters.

Time Varying Comb Filters

Comb filters can be changed slowly over time to produce the following digital audio ``effects'', among others:

Since all of these effects involve modulating delay length over time, and since time-varying delay lines typically require interpolation, these applications will be discussed after Chapter 5 which covers variable delay lines. For now, we will pursue what can be accomplished using fixed (time-invariant) delay lines. Perhaps the most important application is artificial reverberation, addressed in Chapter 3.

Feedback Delay Networks (FDN)

Figure 2.28: Order 3 MIMO Feedback Delay Network (FDN).

The FDN can be seen as a vector feedback comb filter,3.10obtained by replacing the delay line with a diagonal delay matrix (defined in Eq.$ \,$(2.10) below), and replacing the feedback gain $ g$ by the product of a diagonal matrix $ {\bm \Gamma}$ times an orthogonal matrix $ \mathbf{Q}$, as shown in Fig.2.28 for $ N=3$. The time-update for this FDN can be written as

$\displaystyle \left[\begin{array}{c} x_1(n) \\ [2pt] x_2(n) \\ [2pt] x_3(n)\end...
...gin{array}{c} u_1(n) \\ [2pt] u_2(n) \\ [2pt] u_3(n)\end{array}\right] \protect$ (3.6)

with the outputs given by

$\displaystyle \left[\begin{array}{c} y_1(n) \\ [2pt] y_2(n) \\ [2pt] y_3(n)\end...
...array}{c} x_1(n-M_1) \\ [2pt] x_2(n-M_2) \\ [2pt] x_3(n-M_3)\end{array}\right],$ (3.7)

or, in frequency-domain vector notation,
$\displaystyle \mathbf{X}(z)$ $\displaystyle =$ $\displaystyle {\bm \Gamma}\mathbf{Q}\mathbf{D}(z)\mathbf{X}(z) + \mathbf{U}(z)$ (3.8)
$\displaystyle \mathbf{Y}(z)$ $\displaystyle =$ $\displaystyle \mathbf{D}(z) \mathbf{X}(z)$ (3.9)


$\displaystyle \mathbf{D}(z) \isdef \left[\begin{array}{ccc} z^{-M_1} & 0 & 0\\ [2pt] 0 & z^{-M_2} & 0\\ [2pt] 0 & 0 & z^{-M_3} \end{array}\right]. \protect$ (3.10)

FDN and State Space Descriptions

When $ M_1=M_2=M_3=1$ in Eq.$ \,$(2.10), the FDN (Fig.2.28) reduces to a normal state-space model1.3.7),

$\displaystyle \mathbf{x}(n+1)$ $\displaystyle =$ $\displaystyle \mathbf{A}\, \mathbf{x}(n) + \mathbf{u}_+(n)$  
$\displaystyle \mathbf{y}_+(n)$ $\displaystyle =$ $\displaystyle \mathbf{x}(n)
\protect$ (3.11)

The matrix $ \mathbf{A}={\bm \Gamma}\mathbf{Q}$ is the state transition matrix. The vector $ \mathbf{x}(n) = [x_1(n), x_2(n), x_3(n)]^T$ holds the state variables that determine the state of the system at time $ n$. The order of a state-space system is equal to the number of state variables, i.e., the dimensionality of $ \mathbf{x}(n)$. The input and output signals have been trivially redefined as

\mathbf{u}_+(n) &\isdef & \mathbf{u}(n+1)\\
\mathbf{y}_+(n) &\isdef & \mathbf{y}(n+1)

to follow normal convention for state-space form.

Thus, an FDN can be viewed as a generalized state-space model for a class of $ N$th-order linear systems--``generalized'' in the sense that unit delays are replaced by arbitrary delays. This correspondence is valuable for analysis because tools for state-space analysis are well known and included in many software libraries such as with matlab.

Single-Input, Single-Output (SISO) FDN

When there is only one input signal $ u(n)$, the input vector $ \mathbf{u}(n)$ in Fig.2.28 can be defined as the scalar input $ u(n)$ times a vector of gains:

$\displaystyle \mathbf{u}(n) = \mathbf{B}u(n)

where $ \mathbf{B}$ is an $ N\times 1$ matrix. Similarly, a single output can be created by taking an arbitrary linear combination of the $ N$ components of $ \mathbf{y}(n)$. An example single-input, single-output (SISO) FDN for $ N=3$ is shown in Fig.2.29.

Figure 2.29: Order 3 SISO Feedback Delay Network (FDN).

Note that when $ M_1=M_2=M_3=1$, this system is capable of realizing any transfer function of the form

$\displaystyle H(z) = \frac{\beta_1z^{-1}+\beta_2z^{-2}+\beta_3z^{-3}}{1+a_1z^{-1}+a_2z^{-2}+a_3z^{-3}}.

By elementary state-space analysis [449, Appendix E], the transfer function can be written in terms of the FDN system parameters as

$\displaystyle H(z) = \mathbf{C}^T(z\mathbf{I}- \mathbf{A})^{-1}\mathbf{B}

where $ \mathbf{I}$ denotes the $ 3\times 3$ identity matrix. This is easy to show by taking the z transform of the impulse response of the system.

The more general case shown in Fig.2.29 can be handled in one of two ways: (1) the matrices $ \mathbf{A}, \mathbf{B}, \mathbf{C}$ can be augmented to order $ N=M_1+M_2+M_3$ such that the three delay lines are replaced by $ N$ unit-sample delays, or (2) ordinary state-space analysis may be generalized to non-unit delays, yielding

$\displaystyle H(z) = \mathbf{C}^T \mathbf{D}(z)\left[\mathbf{I}- \mathbf{A}\mathbf{D}(z)\right]^{-1}\mathbf{B}

where $ \mathbf{C}^T$ denotes the matrix transpose of $ \mathbf{C}$, and

$\displaystyle \mathbf{D}(z) \isdef \left[\begin{array}{ccc} z^{-M_1} & 0 & 0\\ [2pt] 0 & z^{-M_2} & 0\\ [2pt] 0 & 0 & z^{-M_3} \end{array}\right]. \protect$

In FDN reverberation applications, $ \mathbf{A}={\bm \Gamma}\mathbf{Q}$, where $ \mathbf{Q}$ is an orthogonal matrix, for reasons addressed below, and $ {\bm \Gamma}$ is a diagonal matrix of lowpass filters, each having gain bounded by 1. In certain applications, the subset of orthogonal matrices known as circulant matrices have advantages [385].

FDN Stability

Stability of the FDN is assured when some norm [451] of the state vector $ \mathbf{x}(n)$ decreases over time when the input signal is zero [220, ``Lyapunov stability theory'']. That is, a sufficient condition for FDN stability is

$\displaystyle \left\Vert\,\mathbf{x}(n+1)\,\right\Vert < \left\Vert\,\mathbf{x}(n)\,\right\Vert, \protect$ (3.12)

for all $ n\geq0$, where $ \left\Vert\,\mathbf{x}(n)\,\right\Vert$ denotes the norm of $ \mathbf{x}(n)$, and

$\displaystyle \mathbf{x}(n+1) = \mathbf{A}\left[\begin{array}{c} x_1(n-M_1) \\ [2pt] x_2(n-M_2) \\ [2pt] x_3(n-M_3)\end{array}\right].

Using the augmented state-space analysis mentioned above, the inequality of Eq.$ \,$(2.12) holds under the $ L2$ norm [451] whenever the feedback matrix $ \mathbf{A}$ in Eq.$ \,$(2.6) satisfies [473]

$\displaystyle \left\Vert\,\mathbf{A}\mathbf{x}\,\right\Vert _2 < \left\Vert\,\mathbf{x}\,\right\Vert _2 \protect$ (3.13)

for all $ \mathbf{x}$, where $ \left\Vert\,\cdot\,\right\Vert _2$ denotes the $ L2$ norm, defined by

$\displaystyle \left\Vert\,\mathbf{x}\,\right\Vert _2 \isdef \sqrt{x_1^2+x_2^2+\dots+x_N^2}.

In other words, stability is guaranteed when the feedback matrix decreases the $ L2$ norm of its input vector.

The matrix norm corresponding to any vector norm $ \vert\vert\,\cdot\,\vert\vert $ may be defined for the matrix $ \mathbf{A}$ as

$\displaystyle \left\Vert\,\mathbf{A}\,\right\Vert \isdef \max_{\mathbf{x}\neq \...

where $ \left\Vert\,\mathbf{x}\,\right\Vert$ denotes the norm of the vector $ \mathbf{x}$. In other words, the matrix norm ``induced'' by a vector norm is given by the maximum of $ \vert\vert\,\mathbf{A}\mathbf{x}\,\vert\vert $ over all unit-length vectors $ \mathbf{x}$ in the space. When the vector norm is the $ L2$ norm, the induced matrix norm is often called the spectral norm. Thus, Eq.$ \,$(2.13) can be restated as

$\displaystyle \left\Vert\,\mathbf{A}\,\right\Vert _2 < 1 \protect$ (3.14)

where $ \left\Vert\,\mathbf{A}\,\right\Vert _2$ denotes the spectral norm of $ \mathbf{A}$.

It can be shown [167] that the spectral norm of a matrix $ \mathbf{A}$ is given by the largest singular value of $ \mathbf{A}$ (`` $ \left\Vert\,\mathbf{A}\,\right\Vert _2=\sigma_1(\mathbf{A})$''), and that this is equal to the square-root of the largest eigenvalue of $ \mathbf{A}\mathbf{A}^T$, where $ \mathbf{A}^T$ denotes the matrix transpose of the real matrix $ \mathbf{A}$.3.11

Since every orthogonal matrix $ \mathbf{Q}$ has spectral norm 1,3.12 a wide variety of stable feedback matrices can be parametrized as

$\displaystyle \mathbf{A}= {\bm \Gamma}\mathbf{Q}

where $ \mathbf{Q}$ is any orthogonal matrix, and $ {\bm \Gamma}$ is a diagonal matrix having entries less than 1 in magnitude:

$\displaystyle {\bm \Gamma}= \left[ \begin{array}{cccc}
g_1 & 0 & \dots & 0\\
0 & 0 & \dots & g_N
\end{array}\right], \quad \left\vert g_i\right\vert<1.

An alternative stability proof may be based on showing that an FDN is a special case of a passive digital waveguide network (derived in §C.15). This analysis reveals that the FDN is lossless if and only if the feedback matrix $ \mathbf{A}$ has unit-modulus eigenvalues and linearly independent eigenvectors.

Allpass Filters

The allpass filter is an important building block for digital audio signal processing systems. It is called ``allpass'' because all frequencies are ``passed'' in the same sense as in ``lowpass'', ``highpass'', and ``bandpass'' filters. In other words, the amplitude response of an allpass filter is 1 at each frequency, while the phase response (which determines the delay versus frequency) can be arbitrary.

In practice, a filter is often said to be allpass if the amplitude response is any nonzero constant. However, in this book, the term ``allpass'' refers to unity gain at each frequency.

In this section, we will first make an allpass filter by cascading a feedback comb-filter with a feedforward comb-filter. This structure, known as the Schroeder allpass comb filter, or simply the Schroeder allpass section, is used extensively in the fields of artificial reverberation and digital audio effects. Next we will look at creating allpass filters by nesting them; allpass filters are nested by replacing delay elements (which are allpass filters themselves) with arbitrary allpass filters. Finally, we will consider the general case, and characterize the set of all single-input, single-output allpass filters. The general case, including multi-input, multi-output (MIMO) allpass filters, is treated in [449, Appendix D].

Allpass from Two Combs

Figure 2.30: A combined feedback/feedforward comb filter which gives an allpass filter when $ b_0 = a_M$.

An allpass filter can be defined as any filter having a gain of $ 1$ at all frequencies (but typically different delays at different frequencies).

It is well known that the series combination of a feedforward and feedback comb filter (having equal delays) creates an allpass filter when the feedforward coefficient is the negative of the feedback coefficient.

Figure 2.30 shows a combination feedforward/feedback comb filter structure which shares the same delay line.3.13 By inspection of Fig.2.30, the difference equation is

v(n) &=& x(n) - a_M v(n-M)\\
y(n) &=& b_0 v(n) + v(n-M).

This can be recognized as a digital filter in direct form II [449]. Thus, the system of Fig.2.30 can be interpreted as the series combination of a feedback comb filter (Fig.2.24) taking $ x(n)$ to $ v(n)$ followed by a feedforward comb filter (Fig.2.23) taking $ v(n)$ to $ y(n)$. By the commutativity of LTI systems, we can interchange the order to get

v(n) &=& b_0 x(n) + x(n-M)\\
y(n) &=& v(n) - a_M y(n-M).

Substituting the right-hand side of the first equation above for $ v(n)$ in the second equation yields more simply

$\displaystyle y(n) = b_0 x(n) + x(n-M) - a_M y(n-M). \protect$ (3.15)

This can be recognized as direct form I [449], which requires $ 2M$ delays instead of $ M$; however, unlike direct-form II, direct-form I cannot suffer from ``internal'' overflow--overflow can happen only at the output.

The coefficient symbols $ b_0$ and $ a_M$ here have been chosen to correspond to standard notation for the transfer function

$\displaystyle H(z) = \frac{b_0 + z^{-M}}{1 + a_M z^{-M}}.

The frequency response is obtained by setting $ z = e^{j\omega T}$, where $ \omega $ denotes radian frequency and $ T$ denotes the sampling period in seconds [449]. For an allpass filter, the frequency magnitude must be the same for all $ \omega\in[-\pi/T,\pi/T]$.

An allpass filter is obtained when $ b_0 = \overline{a_M}$, or, in the case of real coefficients, when $ b_0 = a_M$. To see this, let $ a\isdef
a_M=\overline{b_0}$. Then we have

$\displaystyle \left\vert H(e^{j\omega T})\right\vert
= \left\vert\frac{\overli...
...eft\vert\frac{\overline{a + e^{j\omega MT}}}{a+e^{j\omega MT}}\right\vert = 1.

Nested Allpass Filters

An interesting property of allpass filters is that they can be nested [412,152,153]. That is, if $ H_1(z)$ and $ H_2(z)$ denote unity-gain allpass transfer functions, then both $ H_1(H_2(z))$ and $ H_2(H_1(z))$ are allpass filters. A proof can be based on the observation that, since $ \vert H_i(e^{j\omega})\vert=1$, $ H_i(z)$ can be viewed as a conformal map [326] which maps the unit circle in the $ z$ plane to itself; therefore, the set of all such maps is closed under functional composition. Nested allpass filters were proposed for use in artificial reverberation by Schroeder [412, p. 222].

An important class of nested allpass filters is obtained by nesting first-order allpass filters of the form

$\displaystyle S_i(z) = \frac{k_i+z^{-1}}{1+k_iz^{-1}}.

The nesting begins with $ H_1(z)\isdef S_1(z)$, and $ H_2(z)$ is obtained by replacing $ z^{-1}$ in $ H_1(z)$ by $ z^{-1}S_2(z)$ to get

$\displaystyle H_2(z) \isdef S_1\left([z^{-1}S_2(z)]^{-1}\right) \isdef \frac{k_1+z^{-1}S_2(z)}{1+k_1z^{-1}S_2(z)}.

Figure 2.31a depicts the first-order allpass $ S_1(z)$ in direct form II. Figure 2.31b shows the same filter redrawn as a two-multiplier lattice filter section [297]. In the lattice form, it is clear that replacing $ z^{-1}$ by $ z^{-1}S_2(z)$ just extends the lattice to the right, as shown in Fig.2.32.

The equivalence of nested allpass filters to lattice filters has computational significance since it is well known that the two-multiply lattice sections can be replaced by one-multiply lattice sections [297,314].

Figure 2.31: First-order allpass filter: (a) Direct form II. (b) Two-multiply lattice section. (b) is just (a) folded over.

Figure 2.32: Second-order allpass filter: (a) Nested direct-form II. (b) Consecutive two-multiply lattice sections.

In summary, nested first-order allpass filters are equivalent to lattice filters made of two-multiply lattice sections. In §C.8.4, a one-multiply section is derived which is not only less expensive to implement in hardware, but it additionally has a direct interpretation as a physical model.

More General Allpass Filters

We have so far seen two types of allpass filters:

  • The series combination of feedback and feedforward comb-filters is allpass when their delay lines are the same length and their feedback and feedforward coefficents are the same. An example is shown in Fig.2.30.
  • Any delay element in an allpass filter can be replaced by an allpass filter to obtain a new (typically higher order) allpass filter. The special case of nested first-order allpass filters yielded the lattice digital filter structure of Fig.2.32.
We now discuss allpass filters more generally in the SISO case. (See Appendix D of [449] for the MIMO case.)

Definition: A linear, time-invariant filter $ H(z)$ is said to be lossless if it preserves signal energy for every input signal. That is, if the input signal is $ x(n)$, and the output signal is $ y(n) = (h\ast x)(n)$, we must have

$\displaystyle \sum_{n=-\infty}^{\infty} \left\vert y(n)\right\vert^2 =
\sum_{n=-\infty}^{\infty} \left\vert x(n)\right\vert^2.

In terms of the $ L2$ signal norm $ \left\Vert\,\,\cdot\,\,\right\Vert _2$, this can be expressed more succinctly as

$\displaystyle \left\Vert\,y\,\right\Vert _2^2 = \left\Vert\,x\,\right\Vert _2^2.

Notice that only stable filters can be lossless since, otherwise, $ \left\Vert\,y\,\right\Vert$ is generally infinite, even when $ \left\Vert\,x\,\right\Vert$ is finite. We further assume all filters are causal3.14 for simplicity. It is straightforward to show the following:

It can be shown [449, Appendix C] that stable, linear, time-invariant (LTI) filter transfer function $ H(z)$ is lossless if and only if

$\displaystyle \left\vert H(e^{j\omega})\right\vert = 1, \quad \forall \omega.

That is, the frequency response must have magnitude 1 everywhere over the unit circle in the complex $ z$ plane.

Thus, ``lossless'' and ``unity-gain allpass'' are synonymous. For an allpass filter with gain $ g$ at each frequency, the energy gain of the filter is $ g^2$ for every input signal $ x$. Since we can describe such a filter as an allpass times a constant gain, the term ``allpass'' will refer here to the case $ g=1$.

Example Allpass Filters

  • The simplest allpass filter is a unit-modulus gain

    $\displaystyle H(z) = e^{j\phi}

    where $ \phi$ can be any phase value. In the real case $ \phi$ can only be 0 or $ \pi$, in which case $ H(z)=\pm 1$.

  • A lossless FIR filter can consist only of a single nonzero tap:

    $\displaystyle H(z) = e^{j\phi} z^{-K}

    for some fixed integer $ K$, where $ \phi$ is again some constant phase, constrained to be 0 or $ \pi$ in the real-filter case. Since we are considering only causal filters here, $ K\geq 0$. As a special case of this example, a unit delay $ H(z)=z^{-1}$ is a simple FIR allpass filter.

  • The transfer function of every finite-order, causal, lossless IIR digital filter (recursive allpass filter) can be written as

    $\displaystyle H(z) = e^{j\phi} z^{-K} \frac{\tilde{A}(z)}{A(z)} \protect$ (3.16)

    where $ K\geq 0$, $ A(z) = 1 + a_1 z^{-1}+ a_2 z^{-2} + \cdots + a_N
z^{-N}$, and $ \tilde{A}(z)\isdef z^{-N}\overline{A}(z^{-1})$. The polynomial $ \tilde{A}(z)$ can be obtained by reversing the order of the coefficients in $ A(z)$ and conjugating them. (The factor $ z^{-N}$ serves to restore negative powers of $ z$ and hence causality.)

In summary, every SISO allpass filter can be expressed as the product of a unit-modulus gain factor, a pure delay, and an IIR transfer function in which the numerator is the ``flip'' of the denominator, as in Eq.$ \,$(2.16).

Gerzon Nested MIMO Allpass

An interesting generalization of the single-input, single-output Schroeder allpass filter (defined in §2.8.1) was proposed by Gerzon [157] for use in artificial reverberation systems.

The starting point can be the first-order allpass of Fig.2.31a on page [*], or the allpass made from two comb-filters depicted in Fig.2.30 on page [*].3.15In either case,

Let $ \underline{x}(n)$ denote the $ N\times 1$ input vector with components $ x_i(n), i=1,\dots,N$, and let $ \underline{X}(z)=[X_1(z),\dots,X_N(z)]$ denote the corresponding vector of z transforms. Denote the $ N\times 1$ output vector by $ \underline{y}(n)$. The resulting vector difference equation becomes, in the frequency domain (cf. Eq.$ \,$(2.15))

$\displaystyle \underline{Y}(z) = \overline{g} \underline{X}(z) + \mathbf{U}(z)\underline{X}(z) - g \mathbf{U}(z)\underline{Y}(z)

which leads to the matrix transfer function

$\displaystyle \mathbf{H}(z) = [\mathbf{I}+ g \mathbf{U}(z)]^{-1}[\overline{g}\mathbf{I}+ \mathbf{U}(z)]

where $ \mathbf{I}$ denotes the $ N\times N$ identity matrix, and $ \mathbf{U}(z)$ denotes any paraunitary matrix transfer function [500], [449, Appendix C].

Note that to avoid implementing $ \mathbf{U}(z)$ twice, $ \mathbf{H}(z)$ should be realized in vector direct-form II, viz.,

\underline{v}_d(n) &=& \mathbf{U}(d)\underline{v}(n) = {\cal Z...
...line{y}(n) &=& \underline{v}(n) + \overline{g}\underline{v}_d(n)

where $ d$ denotes the unit-delay operator ( $ d^k x(n)\isdef x(n-k)$).

To avoid a delay-free loop, the paraunitary matrix must include at least one pure delay in every row, i.e., $ \mathbf{U}(z) = z^{-1}
\mathbf{U}^\prime(z)$ where $ \mathbf{U}^\prime(z)$ is paraunitary and causal.

In [157], Gerzon suggested using $ \mathbf{U}(z)$ of the form

$\displaystyle \mathbf{U}(z) = \mathbf{D}(z) \mathbf{Q}

where $ \mathbf{Q}$ is a simple $ N\times N$ orthogonal matrix, and

$\displaystyle \mathbf{D}(z) = \left[ \begin{array}{ccccc} z^{-m_1} & 0 & 0 & \d...
...ts & \ddots& \vdots\\ 0 & 0 & 0 & \dots & z^{-m_N} \end{array} \right] \protect$ (3.17)

is a diagonal matrix of pure delays, with the lengths $ m_i$ chosen to be mutually prime (as suggested by Schroeder [417] for a series combination of Schroeder allpass sections). This structure is very close to the that of typical feedback delay networks (FDN), but unlike FDNs, which are ``vector feedback comb filters,'' the vectorized Schroeder allpass is a true multi-input, multi-output (MIMO) allpass filter.

Gerzon further suggested replacing the feedback and feedforward gains $ \pm g$ by digital filters $ \pm G(z)$ having an amplitude response bounded by 1. In principle, this allows the network to be arbitrarily different at each frequency.

Gerzon's vector Schroeder allpass is used in the IRCAM Spatialisateur [218].

Allpass Digital Waveguide Networks

We now describe the class of multi-input, multi-output (MIMO) allpass filters which can be made using closed waveguide networks. We will see that feedback delay networks can be obtained as a special case.

Signal Scattering

The digital waveguide was introduced in §2.4. A basic fact from acoustics is that traveling waves only happen in a uniform medium. For a medium to be uniform, its wave impedance3.17must be constant. When a traveling wave encounters a change in the wave impedance, it will reflect, at least partially. If the reflection is not total, it will also partially transmit into the new impedance. This is called scattering of the traveling wave.

Let $ R_1$ denote the constant impedance in some waveguide, such as a stretched steel string or acoustic bore. Then signal scattering is caused by a change in wave impedance from $ R_1$ to $ R_2$. We can depict the partial reflection and transmission as shown in Fig.2.33.

Figure 2.33: Signal scattering at a junction of different wave impedances $ R_1$ & $ R_2$.

The computation of reflection and transmission in both directions, as shown in Fig.2.33 is called a scattering junction.

As derived in Appendix C, for force or pressure waves, the reflection coefficient $ k_1$ is given by

$\displaystyle k_1 = \frac{R_2-R_1}{R_2+R_1} \protect$ (3.18)

That is, the coefficient of reflection for a traveling pressure wave leaving impedance $ R_1$ and entering impedance $ R_2$ is given by the impedance step over the impedance sum. The reflection coefficient $ k_1$ fully characterizes the scattering junction.

For velocity traveling waves, the reflection coefficient is just the negative of that for force/pressure waves, or $ -k_1$ (see Appendix C).

Signal scattering is lossless, i.e., wave energy is neither created nor destroyed. An implication of this is that the transmission coefficient for a traveling pressure wave leaving impedance $ R_1$ and entering impedance $ R_2$ is given by

$\displaystyle t_1 = 1 + k_1.

For velocity waves, the transmission coefficient is $ t_1 = 1-k_1$, which is perhaps more intuitive.

Digital Waveguide Networks

A Digital Waveguide Network (DWN) consists of any number of digital waveguides interconnected by scattering junctions. For example, when two digital waveguides are connected together at their endpoints, we obtain a two-port scattering junction as shown in Fig.2.33. When three or more waveguides are connected at a point, we obtain a multiport scattering junction, as discussed in §C.8. In other words, a digital waveguide network is formed whenever digital waveguides having arbitrary wave impedances are interconnected. Since DWNs are lossless, they provide a systematic means of building a very large class of MIMO allpass filters.

Consider the following question:

Under what conditions may I feed a signal from one point inside a given allpass filter to some other point (adding them) without altering signal energy at any frequency?
In other words, how do we add feedback paths anywhere and everywhere, thereby maximizing the richness of the recursive feedback structure, while maintaining an overall allpass structure?

The digital waveguide approach to allpass design [430] answers this question by maintaining a physical interpretation for all delay elements in the system. Allpass filters are made out of lossless digital waveguides arranged in closed, energy conserving networks. See Appendix C for further discussion.

Next Section:
Artificial Reverberation
Previous Section:
Introduction to Physical Signal Models