Artificial Reverberation

This chapter summarizes basic results in systems for artificial reverberation. Such systems make extensive use of tapped delay lines, comb filters, and allpass filters (described in the previous chapter).

The Reverberation Problem

Consider the requirements for acoustically simulating a concert hall or other listening space. Suppose we only need the response at one or more discrete listening points in space (``ears'') due to one or more discrete point sources of acoustic energy.

First, as discussed in §2.2, the direct signal propagating from a sound source to a listener's ear can be simulated using a single delay line in series with an attenuation scaling or lowpass filter. Second, each sound ray arriving at the listening point via one or more reflections can be simulated using a delay-line and some scale factor (or filter). Two rays create a feedforward comb filter, like the one in Fig.2.9 on page [*]. More generally, a tapped delay line (Fig.2.19) can simulate many reflections. Each tap brings out one echo at the appropriate delay and gain, and each tap can be independently filtered to simulate air absorption and lossy reflections. In principle, tapped delay lines can accurately simulate any reverberant environment, because reverberation really does consist of many paths of acoustic propagation from each source to each listening point. As we will see, the only limitations of a tapped delay line are (1) it is expensive computationally relative to other techniques, (2) it handles only one ``point to point'' transfer function, i.e., from one point-source to one ear,4.1 and (3) it should be changed when the source, listener, or anything in the room moves.

Exact Reverb via Transfer-Function Modeling

Figure 3.1 depicts the general reverberation scenario for three sources and one listener (two ears). In general, the filters should also include filtering by the pinnae of the ears, so that each echo can be perceived as coming from the correct angle of arrival in 3D space; in other words, at least some reverberant reflections should be spatialized so that they appear to come from their natural directions in 3D space [248]. Again, the filters change if anything changes in the listening space, including source or listener position. The artificial reverberation problem is then to implement some approximation of the system in Fig.3.1.

Figure 3.1: General reverberation simulation for three sources and one listener (two ears). Each filter $ h_{ij}$ can be implemented as a tapped delay line FIR filter.

In the frequency domain, it is convenient to express the input-output relationship in terms of the transfer-function matrix:

$\displaystyle \left[\begin{array}{c} Y_1(z) \\ [2pt] Y_2(z) \end{array}\right] ...
...left[\begin{array}{c} S_1(z) \\ [2pt] S_2(z) \\ [2pt] S_3(z)\end{array}\right]

Denoting the impulse response of the filter from source $ j$ to ear $ i$ by $ h_{ij}(n)$, the two output signals in Fig.3.1 are computed by six convolutions:

$\displaystyle y_i(n) = \sum_{j=1}^3 s_j \ast h_{ij}(n) =
\sum_{j=1}^3 \sum_{m=0}^{M_{ij}} s_j(m)h_{ij}(n-m), \quad i=1,2,

where $ M_{ij}$ denotes the order of FIR filter $ h_{ij}$. Since many of the filter coefficients $ h_{ij}(n)$ are zero (at least for small $ n$), it is more efficient to implement them as tapped delay lines2.5) so that the inner sum becomes sparse. For greater accuracy, each tap may include a lowpass filter which models air absorption [314] and/or spherical spreading loss (see §2.3). For large $ n$, the impulse responses are not sparse, and we must either implement very expensive FIR filters, or approximate the tail of the impulse response using less expensive IIR filters; this subject--``late reverberation'' approximation--is taken up in §3.4.

Complexity of Exact Reverberation

For music, a typical reverberation time4.2is on the order of one second. Suppose we choose exactly one second for the reverberation time. At an audio sampling rate of 50 kHz, each filter in Fig.3.1 requires 50,000 multiplies and additions per sample, or 2.5 billion multiply-adds per second. Handling three sources and two listening points (ears), we reach 30 billion operations per second for the reverberator. This computational load would require at least 10 Pentium CPUs clocked at 3 Gigahertz, assuming they were doing nothing else, and assuming both a multiply and addition can be initiated each clock cycle, with no wait-states caused by the three required memory accesses (input, output, and filter coefficient). While these numbers can be improved using FFT convolution instead of direct convolution (at the price of introducing a throughput delay which can be a problem for real-time systems), it remains the case that exact implementation of all relevant point-to-point transfer functions in a reverberant space is very expensive computationally.

While a tapped delay line FIR filter can provide an accurate model for any point-to-point transfer function in a reverberant environment, it is rarely used for this purpose in practice because of the extremely high computational expense. While there are specialized commercial products that implement reverberation via direct convolution of the input signal with the impulse response, the great majority of artificial reverberation systems use other methods to synthesize the late reverb more economically.

Possibility of a Physical Reverb Model

One disadvantage of the point-to-point transfer function model depicted in Fig.3.1 is that some or all of the filters must change when anything moves. If instead we had a computational model of the whole acoustic space, sources and listeners could be moved as desired without affecting the underlying room simulation. Furthermore, we could use ``virtual dummy heads'' as listeners, complete with pinnae filters, so that all of the 3D directional aspects of reverberation could be captured in two extracted signals for the ears. Thus, there are compelling reasons to consider a full 3D model of a desired acoustic listening space.

Let us briefly estimate the computational requirements of a ``brute force'' acoustic simulation of a room. It is generally accepted that audio signals require a 20 kHz bandwidth. Since sound travels at about a foot per millisecond (see §B.7.14 for a more precise value), a 20 kHz sinusoid has a wavelength on the order of 1/20 feet, or about half an inch. Since, by elementary sampling theory, we must sample faster than twice the highest frequency present in the signal, we need ``grid points'' in our simulation separated by a quarter inch or less. At this grid density, simulating an ordinary 12'x12'x8' room in a home requires more than 100 million grid points. Using finite-difference (Appendix D) or waveguide-mesh techniques (§C.14,Appendix E) [518,396], the average grid point can be implemented as a multiply-free computation; however, since it has waves coming and going in six spatial directions, it requires on the order of 10 additions per sample. Thus, running such a room simulator at an audio sampling rate of 50 kHz requires on the order of 50 billion additions per second, which is comparable to the three-source, two-ear simulation of Fig.3.1. However, scaling up to a 100'x50'x20' concert hall requires more than 5 quadrillion operations per second. We may conclude, therefore, that a fine-grained physical model of a complete concert hall over the audio band is prohibitively expensive.

The remainder of this chapter will be concerned with ways of reducing computational complexity without sacrificing too much perceptual quality.

Perceptual Aspects of Reverberation

Artificial reverberation is an unusually interesting signal processing problem because, as discussed in the previous sections, the ``obvious'' methods based on physical modeling or input-output modeling are too expensive computationally for most applications. This leads to the question of what are the perceptually important aspects of reverberation, and how can these be provided by efficient computational structures.

Perception of Echo Density and Mode Density

The reverberation problem can be greatly simplified without sacrificing perceptual quality. For example, it can be shown4.3that for typical rooms, the echo density increases as $ t^2$, where $ t$ is time. Therefore, beyond some time, the echo density is so great that it can be modeled as some uniformly sampled stochastic process without loss of perceptual fidelity. In particular, there is no need to explicitly compute multiple echoes per sample of sound. For smoothly decaying late reverb (the desired kind), an appropriate random process sampled at the audio sampling rate will sound equivalent perceptually.

Similarly, it can be shown4.4that the number of resonant modes in any given frequency band increases as frequency squared, so that above some frequency, the modes are so dense that they are perceptually equivalent to a random frequency response generated according to some statistics. In particular, there is no need to explicitly implement resonances so densely packed that the ear cannot hear them all.

In summary, we see that, based on limits of perception, the impulse response of a reverberant room can be divided into two segments. The first segment, called the early reflections, consists of the relatively sparse first echoes in the impulse response. The remainder, called the late reverberation, is so densely populated with echoes that it is best to characterize the response statistically in some way. Section 3.3 discusses methods for simulating early reflections in the reverberation impulse response.

Similarly, the frequency response of a reverberant room can be divided into two segments. The low-frequency interval consists of a relatively sparse distribution of resonant modes, while at higher frequencies the modes are packed so densely that they are best characterized statistically as a random frequency response with certain (regular) statistical properties. Section 3.4 describes methods for synthesizing hiqh quality late reverberation.

Perceptual Metrics for Ideal Reverberation

Some desirable controls for an artificial reverberator include [218]

The time to decay 60 dB ($ t_{60}$) is a classical objective parameter used as a measure of perceived reverberation time. Classically, $ t_{60}$ was measured for the whole response. More recently [216], it has become more common to design for a given $ t_{60}$ at more than one frequency, e.g., one for low frequencies, another for high frequencies, and interpolated values at intermediate frequencies. Perceptual studies indicate that reverberation time should be independently adjustable in at least three frequency bands [217].

Energy Decay Curve

For measuring and defining reverberation time $ t_{60}$, Schroeder introduced the so-called energy decay curve (EDC) which is the tail integral of the squared impulse response at time $ t$:

$\displaystyle \hbox{EDC}(t) \isdef \int_t^\infty h^2(\tau)d\tau

Thus, $ \hbox{EDC}(t)$ is the total amount of signal energy remaining in the reverberator impulse response at time $ t$. The EDC decays more smoothly than the impulse response itself, and so it is more useful than ordinary amplitude envelopes for estimating $ t_{60}$.

Energy Decay Relief

The energy decay relief (EDR) is a time-frequency distribution which generalizes the EDC to multiple frequency bands [215]:

$\displaystyle \hbox{EDR}(t_n,f_k) \isdef \sum_{m=n}^M \left\vert H(m,k)\right\vert^2

where $ H(m,k)$ denotes bin $ k$ of the short-time Fourier transform (STFT) at time-frame $ m$ [12,451], and $ M$ denotes the total number of time frames. The FFT within the STFT is typically used with a window, such as a Hann window of length 30 or 40 ms.

Thus, $ \hbox{EDR}(t_n,f_k)$ is the total amount of signal energy remaining in the reverberator's impulse response at time $ t_n=nT$ in a frequency band centered about $ f_k=kf_s/N$ Hz, where $ N$ denotes the FFT length.

The EDR of a violin-body impulse response is shown in Fig.3.2. For better correspondence with audio perception, the frequency axis is warped to the Bark frequency scale [459], and energy is summed within each Bark band (one critical band of hearing equals one Bark). A violin body can be regarded as a very small reverberant room, with correspondingly ``magnified'' spectral structure relative to reverberant rooms.

Figure 3.2: Energy Decay Relief of a violin-body impulse response (from [203]).

The EDR of the Boston Symphony Hall is displayed in [153, p. 96].

The EDR is used to measure partial overtone dampings from recordings of a vibrating string in §6.11.5.

Early Reflections

The ``early reflections'' portion of the impulse response of a reverberant environment is often taken to be the first 100ms or so [314]. However, for greater accuracy, it should be extended to the time at which the reverberation reaches its asymptotic statistical behavior.4.5

Since the early reflections are relatively sparse and span a relatively short time, they are often implemented using tapped delay lines (TDL).4.6

If the computation is affordable, it is best to spatialize the early reflections [248] so that they come from the right directions in 3D space. It is known [61] that the early reflections have a strong influence on spatial impression, i.e., the listener's perception of the listening-space shape.

Figure 3.3 shows a general schematic of a reverberator with separate implementations of early and late reverberation. The taps on the TDL may include lowpass filtering for simulation of air absorption.

While the late reverb logically begins when the early reflections end, as implemented in Fig.3.3, it may be more cost-effective in practice to feed the ``late reverb'' unit from an earlier tap (or set of taps) from the TDL, thus overlapping them somewhat. This can help when the late reverberator needs time to build up to full density.

Figure 3.3: Division of a reverberator into early and late sections.

It is often the case that early reflections can be worked into the late-reverberation simulation. For example, usually there are long delay lines in which the input signal can be summed at various points, thereby implementing a transposed tapped delay line (see §2.5.2).

Good concert halls are observed to have to have stereo-recorded impulse responses that quickly ``lateralize'', with a smooth decay and overall duration of approximately 1.9 seconds [48].

Late Reverberation Approximations

Desired Qualities in Late Reverberation

From a perceptual standpoint, the main qualities desired of a good late-reverberation impulse response are

  1. a smooth (but not too smooth) decay, and
  2. a smooth (but not too regular) frequency response.
Providing an exponential decay is no problem since both stable linear systems and natural reverberation decay exponentially. The harder issue is making it smooth, that is, free of ``flutter,'' ``beating,'' or other unnatural irregularities. In general, smooth decay results when the echo density is sufficiently high. Note, however, that some fluctuation in the short-term energy is needed to sound natural [58,104], corresponding to that of a decaying noise process [314].4.7

A smooth frequency response exhibits no large, isolated gaps or boosts. It is generally provided when the mode density is sufficiently large in the frequency domain, with the modes being spread out uniformly, as opposed to piling up in the same place or separated to form gaps. On the other hand, the modes should not be too regularly spaced, since this can produce audible periodicity in the time-domain impulse response.

An interesting experiment by Moorer [314] was to try exponentially decaying white noise as a late reverberation impulse response. This signal satisfies both smoothness criteria (time domain and frequency domain), and it sounds quite natural. However, since natural reverberation decays faster at high frequencies, it is better to say that the ideal late reverberation impulse response is exponentially decaying ``colored'' noise, with the high-frequency energy decaying faster than the low-frequency energy.

Schroeder's rule of thumb for echo density in the late reverb is 1000 echoes per second or more [417]. However, for impulsive sounds, 10,000 echoes per second or more may be necessary for a smooth response [217,153].

Schroeder Allpass Sections

Manfred Schroeder's original papers on the use of allpass filters for artificial reverberation [417,412,152,153] started a lively thread of research which continues to the present. For many years thereafter, digital reverberation algorithms were designed along the lines suggested by Schroeder using delay lines, comb filters, and allpass filters--elements described in Chapter 2. There was even special-purpose hardware developed to implement these structures efficiently in real time [393]. Today, these elements continue to serve as the basis for commercial devices for artificial reverberation and related effects [104]. They are also still typically used in software for artificial reverberation [86]. We will see some examples starting in §3.5 below.

Schroeder's suggested use of allpass filters was especially brilliant because there is nothing in nature to suggest them. Instead, he recognized the conceptual and practical utility of separating the coloration of reverberation from its duration and density aspects. While Schroeder's 1961 paper is entitled ``Colorless Artificial Reverberation,'' there is no such thing as colorless (exactly allpass) reverberation in the real world. However, it makes sense as an idealization of natural reverb. Colorless reverberation is an idealization only possible in the ``virtual world''.

In Schroeder's original work, and in much work which followed, allpass filters are arranged in series, as shown in Fig.3.4.

Figure 3.4: A cascade of three Schroeder allpass sections. A typical value for $ g$ is 0.7. The delay-line lengths $ M_i$ are typically mutually prime and spanning successive orders of magnitude, e.g., $ 1051,337,113$.

Each allpass can be thought of as expanding each nonzero input sample from the previous stage into an entire infinite allpass impulse response. For this reason, Schroeder allpass sections are sometimes called impulse expanders [483] or impulse diffusers. While not a physical model of diffuse reflection, single reflections are expanded into many reflections, which is qualitatively what is desired.

Another interesting interpretation of a Schroeder allpass section is as a digital waveguide model of the driving-point impedance of an ideal string (or cylindrical acoustic tube) which is reflectively terminated at a real impedance. That is, if the input signal is regarded as samples of a physical velocity, then the output signal is proportional to samples of the corresponding force (or pressure) at the same physical point. The delay line contains traveling-wave samples; the first half corresponds to traveling waves heading toward the far end of the string (or tube), while the second half holds traveling-wave samples coming back the other way toward the driving point. For the mathematical details of this interpretation, see Appendix C.

Nested Allpass Filters

Another common method for increasing the density of an allpass impulse response is to nest two or more allpass filters, as described in §2.8.2 and shown in Fig.2.32 on page [*]. In general, a nested allpass filter is created when one or more of its delay elements is replaced by another allpass filter. As we saw in §2.8.2, first-order nested allpass filters are equivalent to lattice filters. This equivalence implies that any order $ N$ transfer function (any $ N$ poles and zeros) may be obtained from a linear combination of the delay elements of nested first-order allpass filters, since this is a known property of the lattice filter [297].

In general, any delay-element or delay-line inside a stable allpass-filter can be replaced with any stable allpass-filter, and the result will be a stable allpass.

Schroeder Reverberators

The subject of artificial reverberation was initiated in the early 1960s by Manfred Schroeder [417,412]. Early Schroeder reverberators consisted of the following elements [412]:

Figure: A Schroeder reverberator we will call JCREV developed by Prof. John Chowning, founding director of CCRMA (drawn from a 1972 MUS10 software listing, where MUS10 was an ``acoustic compiler'' language descended from Music V [305]).

An example is shown in Fig.3.5, where, in that figure,

$\displaystyle \hbox{AP}_{N}^{\,g} \isdefs \frac{-g + z^{-N}}{1 - g z^{-N}} \protect$ (4.2)

denotes a Schroeder allpass section with delay length $ N$ samples and coefficients $ g$ (see Fig.2.30 and associated discussion),

$\displaystyle \hbox{FBCF}_{N}^{\,g} \isdefs \frac{1}{1-g\,z^{-N}} \protect$ (4.3)

denotes a feedback comb filter with delay length $ N$ and coefficient $ g$ (diagrammed in Fig.2.24), and MM in Fig.3.5 denotes the mixing matrix

   MM$\displaystyle \eqsp \left[\begin{array}{rrrr}
1 & 1 & 1 & 1 \\ [2pt]
-1 & -1 & -1 & -1 \\ [2pt]
-1 & 1 & -1 & 1 \\ [2pt]
1 & -1 & 1 & -1

which can be efficiently implemented using four adders and two negations:
\mbox{\texttt{OutA}} &=& s_1+s_2\\
\mbox{\texttt{OutB}} &=& -...
...& -\mbox{\texttt{OutD}}\\
\mbox{\texttt{OutD}} &=& s_1 - s_2\\

s_1 &=& x_1+x_3\\
s_2 &=& x_2+x_4\\

As discussed above in §3.4.2, the allpass filters provide ``colorless'' high-density echoes in the late impulse response of the reverberator [417]. These allpass filters may also be referred to as diffusers. While allpass filters are ``colorless'' in theory, perceptually, their impulse responses are only colorless when they are extremely short (less than 10 ms or so). Longer allpass impulse responses sound similar to feedback comb-filters. For steady-state tones, however, such as sinusoids, the allpass property gives the same gain at every frequency, unlike comb filters.

Schroeder [412, p. 221] suggests a progression of allpass delay-line lengths close to

$\displaystyle M_iT \approx \frac{100\hbox{ ms}}{3^i}, \quad i=0,1,2,3,4,

and chosen to be mutually prime (no common factors). The $ 100$ ms value was chosen so that when $ g=0.708$ in Eq.$ \,$(3.2), the time to decay 60 dB ($ t_{60}$) would be 2 seconds. Thus, for $ i=0$, $ t_{60}\approx 2$, and each successive allpass has an impulse-response duration that is about a third of the previous one. Using 5 series allpasses in this way yields an impulse-response echo density of about 810 per second, which is close to the desired thousand per second [412, p. 221].

The parallel comb-filter bank is intended to give a psychoacoustically appropriate fluctuation in the reverberator frequency response. As discussed in Chapter 22.6.2), a feedback comb filter can simulate a pair of parallel walls, so one could choose the delay-line length in each comb filter to be the number of samples it takes for a plane wave to propagate from one wall to the opposite wall and back. However, in his original paper [412], Schroeder describes a more psychoacoustically motivated approach:

``There are about 15 large response peaks in every 100 cps [Hz] interval for a room with 1 sec reverberation time. Thus, one might hope that if an artificial reverberator has a comparable number of response peaks it might sound just as good as a real room. We have been able to confirm this expectation by subjective evaluations of the responses of reverberators consisting of several comb filters ... connected in parallel. For a delay of 0.04 sec, the number of response peaks per 100 cps [Hz] is 4. Thus, between 3 and 4 comb filters in parallel ... with incommensurate delays, are required to approximate the number of peaks in the frequency reponse of a room having a reverberation time of T[60] = 1 sec. Also, the open loop gain of the comb filters should not exceed about 0.85 or -1.4 dB to keep the response fluctuations from being excessive.''
Thus, one may choose the comb-filter delay-line lengths more or less arbitrarily, and then use enough of them in parallel (with mutually prime delay-line lengths) to achieve a perceptually adequate fluctuation density in the frequency-response magnitude. In [412], four such delays are chosen between 30 and 45 ms, and the corresponding feedback coefficients $ g_i$ are set to give the desired overall decay time.

The delay lengths shown in Fig.3.7 were optimized by ear by John Chowning (and perhaps others at CCRMA) for an audio sampling rate of $ f_s=25$ kHz.

Finally, for multichannel listening, Schroeder suggested [412] a mixing matrix at the reverberator output. The goal of the mixing matrix is to bring out any number of uncorrelated audio channels of reverberation (for any number of output speakers) [153, p. 111-112].

Example Schroeder Reverberators

Additional example Schroeder Reverberators, drawn from CCRMA software listings, are shown in Figures 3.6 and 3.7. The notation used in the figures is explained above in Equations (3.2-3.3).

Figure: Schroeder reverberator SATREV by Prof. John Chowning at CCRMA (drawn from a 1971 MUS10 software listing).

Figure 3.7: A later Schroeder reverberator for the Samson Box [393] at CCRMA, based on (and still called) JCREV.

A software musical instrument using one of these reverberators simply adds its output, suitably scaled, to the real-time variable RevIn (the global reverberator input sample at the current time). In both examples, we again see three Schroeder allpass filters in series (Schroeder suggested five, as disussed above).

The rather small reverberator of Fig.3.6 is thought to have been used in John Chowning's often-heard FM-brass canon sound examples.4.8Like the more computationally expensive four-channel-audio example in Fig.3.5, it is designed for a 25 kHz sampling rate. Its ``mixing matrix'' is simply a negation of the right stereo channel.

In addition to the allpass chain in Figures 3.6 and 3.7, there is a parallel bank of four feedback comb filters. Since all of the filters are linear and time-invariant [449], the series allpass chain can go either before or after the parallel comb-filter bank. Unlike the allpass filters, the comb filters have an irregular magnitude frequency response, and they can be considered a simulation of four specific echo sequences. The delay lengths in these comb filters may be used to adjust the illusion of ``room size'', although if they are shortened, there should be more of them in parallel, according to Schroeder's quote above.

In Fig.3.7, the reverbator output signal RevOut is fed to the four audio output channels via four delay lines. These delay lengths are specified relative to the sampling rate $ f_s$.4.9 Output delay lines can substitute for or supplement a mixing matrix as a means for decorrelating the reverberation output channels (to minimize reverberation imaging between speakers). In this particular case, however, the delays are evidently not optimized for decorrelation. Also, for purposes of decorrelation, the shortest delay can be subtracted from the other three and its corresponding delay-line eliminated.

A Schroeder reverberator along the above lines may be found in the Synthesis Tool Kit (STK) [86]. See files JCRev.cpp and JCRev.h.

The reverberators shown in Figures 3.5 and 3.6 above are included in the Faust distribution. See functions jcrev, satrev, and reverb_demo in the effect.lib library. While these Schroeder reverberators are quite small by today's standards, they are well tuned for their size. More commonly used today is freeverb (included in the examples directory of the Faust distribution as freeverb.dsp), discussed next.


A more recently developed Schroeder reverberator is ``Freeverb'' -- a public domain C++ program by ``Jezar at Dreampoint'' used extensively in the free-software world. It uses four Schroeder allpasses in series and eight parallel Schroeder-Moorer filtered-feedback comb-filters2.6.5) for each audio channel, and is said to be especially well tuned.

Figure: Freeverb block diagram (left stereo channel). The Schroeder-Moorer lowpass-feedback-comb-filter, denoted $ \protect\hbox{LBCF}_{N}^{\,f,\,d}$ in the figure, is defined in §3.6.2 below. The use of three summers instead of one is for drawing convenience.

Figure 3.8 shows the default signal-processing settings for the Freeverb left stereo channel. Processing for the right channel is obtained by adding an integer to each of the twelve delay-line lengths. This integer is called stereospread, and its default value is 23. (See the file tuning.h for all constants and default values used by Freeverb.) Different software distributions may include slightly different default values in tuning.h. The values in Fig.3.8 were found in the ladspa-cmt-plugins package (``Computer Music Toolkit'') which is included in the Planet CCRMA distribution, and which is based on the June 2000 version of Freeverb, as of this writing. There are at least six more instances of Freeverb in the Planet CCRMA distribution alone.4.10

Freeverb Main Loop

The C++ code for the main processing loop of Freeverb is shown in Fig.3.9. Notice that it sums the two stereo input channels to create a mono signal that is fed to the reverberator, which then computes a stereo output signal.

Figure 3.9: Freeverb processing function.

void revmodel::processreplace(float *inputL, float *inputR,
  float *outputL, float *outputR, long numsamples, int skip)
  float outL,outR,input;
  int i;

  while(numsamples-- > 0)
      outL = outR = 0;
      input = (*inputL + *inputR) * gain;

      // Accumulate comb filters in parallel
      for(i=0; i<numcombs; i++) {
        outL += combL[i].process(input);
        outR += combR[i].process(input);

      // Feed through allpasses in series
      for(i=0; i<numallpasses; i++) {
        outL = allpassL[i].process(outL);
        outR = allpassR[i].process(outR);

      // Calculate output REPLACING anything already there
      *outputL = outL*wet1 + outR*wet2 + *inputL*dry;
      *outputR = outR*wet1 + outL*wet2 + *inputR*dry;

      // Increment sample pointers, allowing for interleave
      // (if any)
      inputL += skip; // For stereo buffers, skip = 2
      inputR += skip;
      outputL += skip;
      outputR += skip;

From the code in Fig.3.9, we see that the left and right reverberator output channels outL and outR are combined with the left and right input channels inputL and inputR as follows:

$\displaystyle \left[\begin{array}{c} \texttt{outputL} \\ [2pt] \texttt{outputR}...
...\left[\begin{array}{c} \texttt{outL} \\ [2pt] \texttt{outR} \end{array}\right]

The dry parameter (initially set to 0 in tuning.h, and typically changed using a graphical user interface (GUI)), determines how much of the original (stereo) input signal is mixed to the output, and wet1 and wet2 determine how much of the reverberated signal is mixed to the output, and also how much ``stereo separation'' occurs in the reverberation. In particular, if $ \texttt{wet1}=\texttt{wet2}$, there is no stereo separation since the same reverberator output signal is sent to the left and right output channels. (Note that there is no decorrelating delay-line like we saw at the output of JCRev in Fig.3.7.) Setting $ \texttt{wet2}=0$ yields maximally different left and right reverberation signals, thus maximizing ``stereo separation'' in the reverb. The default values, determined by tuning.h and revmodel.cpp, are $ \texttt{wet1}=1$ and $ \texttt{wet2}=0$.

Lowpass-Feedback Comb Filter

Inspection of comb.h in the Freeverb source shows that Freeverb's ``comb'' filter is more specifically a lowpass-feedback-comb filter (LBCF4.11--§2.6.5). It is constructed using a delay line whose output is lowpass-filtered and summed with the delay-line's input. The particular lowpass used in Freeverb is a unity-gain one-pole lowpass having the transfer function

$\displaystyle H(z) = \frac{1-d}{1-d\,z^{-1}}.

When $ d=0$, the LBCF reduces to the feedback comb filter (FBCF) of §2.6.2 in which the feedback was not filtered. The overall LBCF transfer function is then

$\displaystyle \hbox{LBCF}_{N}^{\,f,\,d} \;\isdef \; \frac{1}{1 - f\frac{1-d}{1-d\,z^{-1}}\,z^{-N}}.

This structure was introduced for artificial reverberation by Schroeder [412] and Moorer [314].

In Freeverb's comb section (comb.h and comb.cpp), the ``damping'' $ d$ is set initially to

$\displaystyle d = \texttt{damp = initialdamp * scaledamp} = 0.5 \cdot 0.4 = 0.2\; .

The lowpass scale-factor $ f$ is called feedback in the source, and it is set initially to

f &=& \texttt{roomsize = initialroom * scaleroom + offsetroom}\\
&=& 0.5 \cdot 0.28 + 0.7 = 0.84\;.

Increasing the roomsize parameter (typically brought out to a GUI slider) increases $ f$ and hence the reverberation time. Since $ f<1$ is required for dc stability, the roomsize must be less than 1.0714, and so the GUI slider max is typically 1 ($ f=0.98$).

The feedback variable $ f$ mainly determines reverberation time at low-frequencies at which the feedback lowpass has negligible effect. The feedback lowpass causes the reverberation time to decrease with frequency, which is natural. At very high frequencies--those for which the lowpass gain times $ f$ is much less than 0.5--the reverberation time becomes dominated by the diffusion allpass filters (which have a fixed feedback coefficient of $ g=0.5$). Thus, in Freeverb, the ``room size'' parameter can be interpreted as setting the low-frequency T60 (time to decay 60 dB), while the ``damping'' parameter controls how rapidly T60 shortens as a function of increasing frequency. A lower-limit on T60 is given by the four diffusion allpass filters.

In terms of the physical interpretation of the filtered-feedback comb-filter discussed in §2.6.5, Freeverb's roomsize parameter can be interpreted as the square-root of the low-frequency reflection-coefficient of each wall. That is, when a planewave bounces back and forth between two walls, the attenuation coefficient is roomsize after one round trip (two wall reflections). Therefore, a better name in this interpretation would be liveness or reflectivity. Since the round-trip delay is given in samples by the delay-line length, changing the roomsize requires changing the delay-line lengths in this interpretation.

Freeverb Allpass Approximation

In Eq.$ \,$(3.2) we defined the allpass notation $ \hbox{AP}_{N}^{\,g}$ by

$\displaystyle \hbox{AP}_{N}^{\,g} \isdef \frac{-g + z^{-N}}{1 - g z^{-N}}

A look at allpass.h reveals that Freeverb implements

$\displaystyle \hbox{AP}_{N}^{\,g} \approx \frac{-1 + (1+g)z^{-N}}{1 - g z^{-N}}.

As a result, each of the four Freeverb ``allpass'' sections is really a feedback comb-filter $ \hbox{FBCF}_{N}^{\,g}$ in series with a feedforward comb-filter $ \hbox{FFCF}_{N}^{\,-1,1+g}$, where (cf. §2.6)

\hbox{FBCF}_{N}^{\,g} &\isdef & \frac{1}{1 - g\,z^{-N}}\\ [5pt]
\hbox{FFCF}_{N}^{\,-1,1+g} &\isdef & -1 + (1+g)z^{-N}.

A true allpass is obtained only for $ g=(\sqrt{5}-1)/2\approx 0.618$ (reciprocal of the ``golden ratio''). The default value used in Freeverb (see revmodel.cpp) is $ 0.5$. A detailed discussion of feedforward and feedback comb filters appears in §2.6, and corresponding Schroeder allpass filters are described in §2.8.


Since Freeverb is a Schroeder-Moorer reverberator, and such reverbs have been around since the 1970s, its relatively recent success underscores the value of careful parameter tuning (typically by ear, but automatic optimizations are possible).

The idea of synthesizing right-channel processing from left-channel processing to obtain ``stereo spreading'' by enlarging all the delay lines by a fixed amount appears to have been introduced by Freeverb.

FDN Reverberation

Feedback Delay Networks (FDN) were introduced earlier in §2.7. An example is shown in Fig.2.29 on page [*]. After a brief historical summary, this section will cover some practical considerations for the use of FDNs as reverberators.

History of FDNs for Artificial Reverberation

Feedback delay networks were first suggested for artificial reverberation by Gerzon [156], who proposed an ``orthogonal matrix feedback reverberation unit''. He noted that individual feedback comb filters yielded poor quality, but that several such filters could sound good when cross-coupled. An ``orthogonal matrix feedback'' around a parallel bank of delay lines was suggested as a means of obtaining maximally rich cross-coupling. He was especially concerned with good stereo spreading of the reverberation at a time when most artificial reverberators sought merely to decorrelate the reverberation in each output channel.

Later, and apparently independently, Stautner and Puckette [473] suggested a specific four-channel FDN reverberator and gave general stability conditions for the FDN. They proposed the feedback matrix

$\displaystyle \mathbf{A}= g\frac{1}{\sqrt{2}}
0 & 1 &...
... 0 \\
-1 & 0 & 0 & -1\\
1 & 0 & 0 & -1\\
0 & 1 & -1 & 0

which can be understood as a permutation (with one row sign inversion) of a $ 4\times4$ Hadamard matrix (see §3.7.2).

More recently, Jot [217,216] developed a systematic FDN design methodology allowing largely independent setting of reverberation time in different frequency bands. Using Jot's methodology, FDN reverberators can be polished to a high degree of quality, and they are presently considered to be among the best choices for high-quality artificial reverberation.

Jot's early work was concerned only with single-input, single-output (SISO) reverberators. Later work [218] with Jullien and others at IRCAM was concerned also with spatializing the reverberation.

Figure 3.10: Feedback Delay Network (FDN) structure proposed for artificial reverberation by Jot.

An example FDN reverberator using three delay lines is shown in Fig.3.10. It can be seen as an FDN (introduced in §2.7), plus an additional low-order filter $ E(z)$ applied to the non-direct signal. This filter is called a ``tonal correction'' filter by Jot, and it serves to equalize modal energy irrespective of the reverberation time in each band. In other words, if the decay time is made very short in some band, $ E(z)$ will have a large gain in that band so that the total energy in the band's impulse-response is unchanged. This is another example of orthogonalization of reverberation parameters: In this case, adjustments in reverberation time, in any frequency band, do not alter total signal energy in the impulse response in that band.

Choice of Lossless Feedback Matrix

As mentioned in §3.4, an ``ideal'' late reverberation impulse response should resemble exponentially decaying noise [314]. It is therefore useful when designing a reverberator to start with an infinite reverberation time (the ``lossless case'') and work on making the reverberator a good ``noise generator''. Such a starting point is ofen referred to as a lossless prototype [153,430]. Once smooth noise is heard in the impulse response of the lossless prototype, one can then work on obtaining the desired reverberation time in each frequency band (as will be discussed in §3.7.4 below).

In reverberators based on feedback delay networks (FDNs), the smoothness of the ``perceptually white noise'' generated by the impulse response of the lossless prototype is strongly affected by the choice of FDN feedback matrix as well as the (ideally mutually prime) delay-line lengths in the FDN (discussed further in §3.7.3 below). Following are some of the better known feedback-matrix choices.

Hadamard Matrix

A second-order Hadamard matrix may be defined by

$\displaystyle \mathbf{H}_2 \isdef
1 & 1\\
-1 & 1

with higher order Hadamard matrices defined by recursive embedding, e.g.,

$\displaystyle \mathbf{H}_4 \isdef
1& 1& 1&1\\
-1& 1&-1&1\\
-1&-1& 1&1\\

When $ n$ is a power of $ 4$, the Hadamard matrix $ \mathbf{H}_n$ of that order requires no multiplies in fixed-point arithmetic. An $ n\times
n$ Hadamard matrix has the maximum possible determinant of any $ n\times
n$ complex matrix containing elements which are bounded by $ 1$ in magnitude. This can be seen as an optimal mixing and scattering property of the matrix.

As of version 0.9.30, Faust's math.lib4.12contains a function called hadamard(n) for generating an $ n\times
n$ Hadamard matrix, where $ n$ must be a power of $ 2$. A Hadamard feedback matrix is used in the programming example reverb_designer.dsp (a configurable FDN reverberator) distributed with Faust.

A Hadamard feedback matrix is said to be used in the IRCAM Spatialisateur [218].

Householder Feedback Matrix

One choice of lossless feedback matrix $ \mathbf{A}_N$ for FDNs, especially nice in the $ 4\times4$ case, is a specific Householder reflection proposed by Jot [217]:

$\displaystyle \mathbf{A}_N = \mathbf{I}_N - \frac{2}{N}\uv_N\uv_N^T \protect$ (4.4)

where $ \uv_N^T = [1, 1, \dots, 1]$ can be interpreted as the specific vector about which the input vector is reflected in $ N$-dimensional space (followed by a sign inversion). More generally, the identity matrix $ \mathbf{I}_N$ can be replaced by any $ N\times N$ permutation matrix [153, p. 126].

It is interesting to note that when $ N$ is a power of 2, no multiplies are required [430]. For other $ N$, only one multiply is required (by $ 2/N$).

Another interesting property of the Householder reflection $ \mathbf{A}_N$ given by Eq.$ \,$(3.4) (and its permuted forms) is that an $ N\times N$ matrix-times-vector operation may be carried out with only $ 2N-1$ additions (by first forming $ \uv_N^T$ times the input vector, applying the scale factor $ 2/N$, and subtracting the result from the input vector). This is the same computation as physical wave scattering at a junction of identical waveguidesC.8).

An example implementation of a Householder FDN for $ N=3$ is shown in Fig.3.11. As observed by Jot [153, p. 216], this computation is equivalent to $ N$ parallel feedback comb filters with one new feedback path from the output to the input through a gain of $ -2/N$.

Figure 3.11: FDN using a Householder-reflection feedback matrix.

A nice feature of the Householder feedback matrix $ A_N$ is that for $ N\neq 2$, all entries in the matrix are nonzero. This means every delay line feeds back to every other delay line, thereby helping to maximize echo density as soon as possible.

Furthermore, for $ N=4$, all matrix entries have the same magnitude:

$\displaystyle \mathbf{A}_4 = \frac{1}{2}
1 & -1 & -1 ...
-1 & 1 & -1 & -1\\
-1 & -1 & 1 & -1\\
-1 & -1 & -1 & 1

Only the $ N=4$ case is ``balanced'' in this way. For larger $ N$, the diagonal becomes larger than the off-diagonal elements, and as $ N$ becomes very large, the FDN approaches a bank of decoupled parallel comb filters.

Due to the elegant balance of the $ N=4$ Householder feedback matrix, Jot [216] proposes an $ N=16$ FDN based on an embedding of $ N=4$ feedback matrices:

$\displaystyle \mathbf{A}_{16} = \frac{1}{2}
...\mathbf{A}_4 & -\mathbf{A}_4 & -\mathbf{A}_4 & \mathbf{A}_4

Another method is to replace each of the four delay lines in an FDN(4) by a Gerzon vector allpass (see §2.8.5) which is $ 4\times4$ and contains four delay lines.

Householder Reflections

For completeness, this section derives the Householder reflection matrix from geometric considerations [451]. Let $ \mathbf{P}_{\underline{u}}$ denote the projection matrix which orthogonally projects vectors onto $ {\underline{u}}$, i.e.,

$\displaystyle \mathbf{P}_{\underline{u}}= \frac{\underline{u}\,\underline{u}^T}...


$\displaystyle \mathbf{P}_{\underline{u}}\, \underline{x}= \underline{u}\,\frac{...

specifically projects $ \underline{x}$ onto $ \underline{u}$. Since the projection is orthogonal, we have

$\displaystyle \left<\underline{x}-\mathbf{P}_{\underline{u}}\underline{x},\unde...

We may interpret $ (\mathbf{I}-\mathbf{P}_{\underline{u}})\underline{x}$ as the difference vector between $ \underline{x}$ and $ \mathbf{P}_{\underline{u}}\underline{x}$, its orthogonal projection onto $ \underline{u}$, since

$\displaystyle (\mathbf{I}-\mathbf{P}_{\underline{u}})\underline{x}+ \mathbf{P}_{\underline{u}}\underline{x}= \underline{x}

and we have $ (\mathbf{I}-\mathbf{P}_{\underline{u}})\underline{x}\perp \mathbf{P}_{\underline{u}}\underline{x}$ by definition of the orthogonal projection. Consequently, the projection onto $ \underline{u}$ minus this difference vector gives a reflection of the vector $ \underline{x}$ about $ \underline{u}$:

$\displaystyle \underline{y}= \mathbf{P}_{\underline{u}}\underline{x}- (\mathbf{...
...line{u}})\underline{x}= (2\mathbf{P}_{\underline{u}}- \mathbf{I})\underline{x}

Thus, $ \underline{y}$ is obtained by reflecting $ \underline{x}$ about $ \underline{u}$--a so-called Householder reflection.

Most General Lossless Feedback Matrices

As shown in §C.15.3, an FDN feedback matrix $ \mathbf{A}_N$ is lossless if and only if its eigenvalues have modulus 1 and its $ N$ eigenvectors are linearly independent.

A unitary matrix $ Q$ is any (complex) matrix that is inverted by its own (conjugate) transpose:

$\displaystyle Q^{-1} = Q^H,

where $ Q^H$ denotes the Hermitian conjugate (i.e., the complex-conjugate transpose) of $ Q$. When $ Q$ is real (as opposed to complex), we may simply call it an orthogonal matrix, and we write $ Q^{-1} = Q^T$, where $ T$ denotes matrix transposition.

All unitary (and orthogonal) matrices have unit-modulus eigenvalues and linearly independent eigenvectors. As a result, when used as a feedback matrix in an FDN, the resulting FDN will be lossless (until the delay-line damping filters are inserted, as discussed in §3.7.4 below).

Triangular Feedback Matrices

An interesting class of feedback matrices, also explored by Jot [216], is that of triangular matrices. A basic fact from linear algebra is that triangular matrices (either lower or upper triangular) have all of their eigenvalues along the diagonal.4.13 For example, the matrix

$\displaystyle \mathbf{A}_3 = \left[\begin{array}{ccc}
\lambda_1 & 0 & 0\\ [2pt]
a & \lambda_2 & 0\\ [2pt]
b & c & \lambda_3

is lower triangular, and its eigenvalues are $ (\lambda_1,
\lambda_2,\lambda_3)$ for all values of $ a$, $ b$, and $ c$.

It is important to note that not all triangular matrices are lossless. For example, consider

$\displaystyle \mathbf{A}_2 = \left[\begin{array}{cc} 1 & 0 \\ [2pt] 1 & 1 \end{array}\right]

It has two eigenvalues equal to 1, which looks lossless, but a quick calculation shows that there is only one eigenvector, $ [0,1]^T$. This happens because this matrix is a Jordan block of order 2 corresponding to the repeated eigenvalue $ \lambda=1$. A direct computation shows that

$\displaystyle \mathbf{A}_2^n = \left[\begin{array}{cc} 1 & 0 \\ [2pt] n & 1 \end{array}\right]

which is clearly not lossless.

One way to avoid ``coupled repeated poles'' of this nature is to use non-repeating eigenvalues. Another is to convert $ \mathbf{A}$ to Jordan canonical form by means of a similarity transformation, zero any off-diagonal elements, and transform back [329].

Choice of Delay Lengths

Following Schroeder's original insight, the delay line lengths in an FDN ($ M_i$ in Fig.3.10) are typically chosen to be mutually prime. That is, their prime factorizations contain no common factors. This rule maximizes the number of samples that the lossless reverberator prototype must be run before the impulse response repeats.

The delay lengths $ M_i$ should be chosen to ensure a sufficiently high mode density in all frequency bands. An insufficient mode density can be heard as ``ringing tones'' or an uneven amplitude modulation in the late reverberation impulse response.

Mean Free Path

A rough guide to the average delay-line length is the ``mean free path'' in the desired reverberant environment. The mean free path is defined as the average distance a ray of sound travels before it encounters an obstacle and reflects. An approximate value for the mean free path, due to Sabine, an early pioneer of statistical room acoustics, is

$\displaystyle {\overline d} = 4\frac{V}{S}\qquad\hbox{(mean free path)}

where $ V$ is the total volume of the room, and $ S$ is total surface area enclosing the room. This approximation requires the diffuse field assumption, i.e., that plane waves are traveling randomly in all directions [349,47] (see §3.2.1 for a simple construction). Normally, late reverberation satisfies this assumption well, away from open doors and windows, provided the room is not too ``dead''. Regarding each delay line as a mean-free-path delay, the average can be set to the mean free path by equating

$\displaystyle \frac{{\overline d}}{cT} = \frac{1}{N} \sum_{i=1}^N M_i

where $ c$ denotes sound speed and $ T$ denotes the sampling period. This number should be treated as a lower bound because in real rooms reflections are often diffuse, especially at high frequencies. In a diffuse reflection, a single incident plane wave reflects in many directions at once.

Mode Density Requirement

A guide for the sum of the delay-line lengths is the desired mode density. The sum of delay-line lengths $ M_i$ in a lossless FDN is simply the order of the system $ M$:

$\displaystyle M \isdef \sum_{i=1}^N M_i\qquad\hbox{(FDN order)}

The order increases slightly when lowpass filters are introduced after the delay lines to achieve a specific reverberation time at low and high frequencies (as described in the next subsection).

Since the order of a system equals the number of poles, we have that $ M$ is the number of poles on the unit circle in the lossless prototype. If the modes were uniformly distributed, the mode density would be $ M/f_s=MT$ modes per Hz. Schroeder [417] suggests that, for a reverberation time of 1 second, a mode density of 0.15 modes per Hz is adequate. Since the mode widths are inversely proportional to reverberation time, the mode density for a reverberation time of 2 seconds should be 0.3 modes per Hz, etc. In summary, for a sufficient mode density in the frequency domain, Schroeder's formula is

$\displaystyle M \geq 0.15 t_{60}f_s

For a sampling rate of 50 kHz and a reverberation time ($ t_{60}$) equal to 1 second, we obtain $ M\geq 7500$.

Prime Power Delay-Line Lengths

When the delay-line lengths need to be varied in real time, or interactively in a GUI, it is convenient to choose each delay-line length $ \hat{M}_i$ as an integer power of a distinct prime number $ p_i$ [457]:

$\displaystyle \hat{M}_i \isdefs p_i^{m_i}

where we call $ m_i\ge 1$ the ``multiplicity'' of the prime $ p_i$. With this choice, the delay-line lengths are always coprime (no factors in common other than $ 1$), and yet we can lengthen or shorten each delay line individually (by factors of $ p_i$) without affecting the mutually prime property.

Suppose we are initially given desired delay-line lengths $ M_i$ arranged in ascending order so that

$\displaystyle M_1 < M_2 < \cdots < M_N.

Then good prime-power approximations $ \hat{M}_i$ can be expected using the prime numbers in their natural order:

$\displaystyle p_i \in \{2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,\ldots\}

Since $ \hat{M}_i=p_i^{m_i} \,\,\Rightarrow\,\,\log(\hat{M}_i) = m_i \log(p_i)$ (for any logarithmic base), an optimal (in some sense) choice of prime multiplicity $ m_i$ is

$\displaystyle m_i
\isdefs$   round$\displaystyle \left[\frac{\log(M_i)}{\log(p_i)}\right]
\isdefs \left\lfloor 0.5 + \frac{\log(M_i)}{\log(p_i)}\right\rfloor.

where $ M_i$ is the desired length in samples. That is, $ m_i$ can be simply obtained by rounding $ \log(M_i)/\log(p_i)$ to the nearest integer (max 1). The prime-power delay-line length approximation is then of course

$\displaystyle \hat{M}_i \isdefs p_i^{m_i},

and the multiplicative approximation error is bounded by $ p_i^{\pm1/2}$ (when $ M_i\ge\sqrt{p_i}$).

This prime-power length scheme is used to keep 16 delay lines both variable and mutually prime in Faust's reverb_designer.dsp programming example (via the function prime_power_delays in effect.lib).

Achieving Desired Reverberation Times

A lossless prototype reverberator, as in Fig.3.10 when $ g_i=1$, has all of its poles on the unit circle in the $ z$ plane, and its reverberation time is infinity. To set the reverberation time to a desired value, we need to move the poles slightly inside the unit circle. Furthermore, due to air absorption (§2.3B.7.15), we want the high-frequency poles to be more damped than the low-frequency poles [314]. As discussed in §2.3, this type of transformation can be obtained using the substitution

$\displaystyle z^{-1}\leftarrow G(z)z^{-1}, \protect$ (4.5)

where $ G(z)$ denotes the filtering per sample in the propagation medium (a lowpass filter with gain not exceeding 1 at all frequencies).4.14Thus, to set the FDN reverberation time to $ t_{60}(\omega)\isdeftext n_{60}(\omega)T$ at frequency $ \omega $, we want propagation through $ n_{60}$ samples to result in attenuation by $ 60$ dB, i.e.,

$\displaystyle \left[G(e^{j\omega T})\right]^{n_{60}(\omega)} \eqsp 0.001. \protect$ (4.6)

Solving for $ G$, the propagation attenuation per-sample, gives
$\displaystyle G(e^{j\omega T})$ $\displaystyle =\!$ $\displaystyle (0.001)^{\frac{1}{n_{60}(\omega)}}
\eqsp 10^{-3/n_{60}}
\eqsp \left(e^{\mbox{ln}(10)}\right)^{-3/n_{60}} \eqsp e^{-3\,\mbox{ln}(10)/n_{60}}$  
  $\displaystyle =\!$ $\displaystyle e^{-T/\tau(\omega)}
\protect$ (4.7)

The last form comes from $ t_{60}=3$ln$ (10)\tau\approx 6.91\tau$, where $ \tau $ denotes the time constant of decay (time to decay by $ 1/e$) [451], i.e.,

$\displaystyle e^{-t_{60}/\tau}=0.001 \;\;\Leftrightarrow\;\; t_{60}\eqsp -3$ln$\displaystyle (10)\tau. \protect$ (4.8)

Series expanding $ e^{-T/\tau(\omega)}$ and assuming $ n_{60}(\omega)\gg 7$ samples ( $ \tau(\omega)\gg T$ seconds) provides the practically useful approximation

&\!=\!& 1 - \frac{T}{\tau(\omega)} + \frac...
... \frac{3\mbox{ln}(10)}{n_{60}}
\approxs 1 - \frac{6.91}{n_{60}}.

Conformal Map Interpretation of Damping Substitution

The relation $ G(e^{j\omega T})\approx e^{-T/\tau(\omega)}$ [Eq.$ \,$(3.7)] can be written down directly from $ z^{-1}\leftarrow G(z)z^{-1}$ [Eq.$ \,$(3.5)] by interpreting Eq.$ \,$(3.5) as an approximate conformal map [326] which takes each pole $ p_k=e^{j\omega_kT}$, say, from the unit circle to the point $ p'_k\approx G(e^{j\omega_kT})e^{-j\omega_kT}$. Thus, the new pole radius is approximately $ \vert p'_k\vert\approx\vert G(e^{j\omega_kT})\vert$, where the approximation is valid when $ G(z)$ is approximately constant between the new pole location and the unit circle. To see this, consider the partial fraction expansion [449] of a proper $ N$th-order lossless transfer function $ H(z)$ mapped to $ H'(z)\isdeftext H[z/G(z)]$:

$\displaystyle H'(z)
= \sum_{k=1}^N \frac{r_k}{1-p_kG(z)z^{-1}}
= \sum_{k=1}^N r_k\left[1+p_kG(z)z^{-1}+p_k^2G^2(z)z^{-2}+\cdots\right],

where $ p_k=\exp(j\omega_k T)$ denotes the $ k$th original pole on the unit circle. Then $ H'(z)$ has a pole at $ z'_k=p_kG(z'_k)$, which must be solved iteratively for $ z'_k$, in general, since $ G(z)$ can be a complicated function of $ z$. However, if $ G(z'_k)\approx G(p_k)$, which is typically true when damping digital waveguides for music applications, then $ z'_k\approx p_kG(p_k)=G[\exp(j\omega_k
T)]\exp(j\omega_k T)$. In other words, we can think of the pole $ p_k$ as moving from $ \exp(j\omega_k T)$ to near $ G[\exp(j\omega_k
T)]\exp(j\omega_k T)$, provided it doesn't move too far compared with the near-constant behavior of $ G(z)$. Another way to say it is that we need $ G(z)$ to be approximately the same at the new pole location and its initial location on the unit circle in the lossless prototype.

Happily, while we may not know precisely where our poles have moved as a result of introducing the per-sample damping filter $ G(z)$, the relation $ G^{n_{60}(\omega)}(e^{j\omega T})=0.001$ [Eq.$ \,$(3.6)] remains exact at every frequency by construction, as it is based only on the physical interpretation of each unit delay as a propagation delay for a plane wave across one sampling interval $ T$, during which (zero-phase) filtering by $ G(z)$ is assumed (§2.3). More generally, we can design minimum-phase filters for which $ \vert G^{n_{60}(\omega)}(e^{j\omega T})\vert=0.001$, and neglect the resulting phase dispersion.

In summary, we see that replacing $ z^{-1}$ by $ G(z)z^{-1}$ everywhere in the FDN lossless prototype (or any lossless LTI system for that matter) serves to move its poles away from the unit circle in the $ z$ plane onto some contour inside the unit circle that provides the desired decay time at each frequency.

A general design guideline for artificial reverberation applications [217] is that all pole radii in the reverberator should vary smoothly with frequency. This translates to $ G(z)$ having a smooth frequency response. To see why this is desired, consider momentarily the frequency-independent case in which we desire the same reverberation time at all frequencies (Fig.3.10 with real $ g_i\le 1$, as drawn). In this case, it is ideal for all of the poles to have this decay time. Otherwise, the late decay of the impulse response will be dominated by the poles having the largest magnitude, and it will be ``thinner'' than it was at the beginning of the response when all poles were contributing to the output. Only when all poles have the same magnitude will the late response maintain the same modal density throughout the decay.

Damping Filters for Reverberation Delay Lines

In an FDN, such as the one shown in Fig.3.10, delays $ z^{-1}$ appear in long delay-line chains $ z^{-M_i}$. Therefore, the filter needed at the output (or input) of the $ i$th delay line is $ G^{M_i}(z)$ (replace $ g_i$ by $ G^{M_i}(z)$ in Fig.3.10).4.15 However, because $ G(e^{j\omega T})$ is so close to $ 1$ in magnitude, and because it varies so weakly across the frequency axis, we can design a much lower-order filter $ H_i(z)\approx G^{M_i}(z)$ that provides the desired attenuation versus frequency to within psychoacoustic resolution. In fact, perfectly nice reverberators can be designed in which $ H_i(z)$ is merely first order for each $ i$ [314,217].

Delay-Line Damping Filter Design

Let $ t_{60}(\omega)$ denote the desired reverberation time at radian frequency $ \omega $, and let $ H_i(z)$ denote the transfer function of the lowpass filter to be placed in series with the $ i$th delay line which is $ M_i$ samples long. The problem we consider now is how to design these filters to yield the desired reverberation time. We will specify an ideal amplitude response for $ H_i(z)$ based on the desired reverberation time at each frequency, and then use conventional filter-design methods to obtain a low-order approximation to this ideal specification.

In accordance with Eq.$ \,$(3.6), the lowpass filter $ H_i(z)$ in series with a length $ M_i$ delay line should approximate

$\displaystyle H_i(z) = G^{M_i}(z)

which implies

$\displaystyle \left\vert H_i(e^{j\omega T})\right\vert^{\frac{t_{60}(\omega)}{M_iT}} = 0.001.

Taking $ 20\log_{10}$ of both sides gives

$\displaystyle 20 \log_{10}\left\vert H_i(e^{j\omega T})\right\vert = -60 \frac{M_i T}{t_{60}(\omega)}. \protect$ (4.9)

This is the same formula derived by Jot [217] using a somewhat different approach.

Now that we have specified the ideal delay-line filter $ H_i(e^{j\omega T})$ in terms of its amplitude response in dB, any number of filter-design methods can be used to find a low-order $ H_i(z)$ which provides a good approximation to satisfying Eq.$ \,$(3.9). Examples include the functions invfreqz and stmcb in Matlab. Since the variation in reverberation time is typically very smooth with respect to $ \omega $, the filters $ H_i(z)$ can be very low order.

First-Order Delay-Filter Design

The first-order case is very simple while enabling separate control of low-frequency and high-frequency reverberation times. For simplicity, let's specify $ t_{60}(0)$ and $ t_{60}(\pi/T)$, denoting the desired decay-time at dc ($ \omega=0$) and half the sampling rate ( $ \omega=\pi/T$). Then we have determined the coefficients of a one-pole filter:

$\displaystyle H_i(z) = \frac{g_i}{1-p_iz^{-1}}

The dc gain of this filter is $ H_i(1)=g_i/(1-p_i)$, while the gain at $ \omega=\pi/T$ is $ H_i(-1)=g_i/(1+p_i)$. From Eq.$ \,$(3.9) (or Eq.$ \,$(3.8)), we obtain two equations in two unknowns:

\frac{g_i}{1-p_i} &=& 10^{-3 M_i T / t_{60}(0)}
\eqsp e^{-M_iT...
\eqsp e^{-M_iT/\tau(\pi/T)} \isdefs R_\pi^{M_i}\\ [5pt]

where $ D_i\isdeftext M_iT$ denotes the $ i$th delay-line length in seconds. These two equations are readily solved to yield

p_i &=& \frac{R_0^{M_i}-R_\pi^{M_i}}{R_0^{M_i}+R_\pi^{M_i}}\\ [5pt]
g_i &=& \frac{2R_0^{M_i}R_\pi^{M_i}}{R_0^{M_i}+R_\pi^{M_i}}

The truncated series approximation

$\displaystyle R_\omega^{M_i} \isdefs e^{-\frac{M_iT}{\tau(\omega)}}
\approxs 1 ...
\isdefs 1 - \frac{6.91\,M_i}{n_{60}(\omega)}

has been found to work well in practical FDN reverberators.

Orthogonalized First-Order Delay-Filter Design

In [217], first-order delay-line filters of the form

$\displaystyle H_i(z) \eqsp g'_i \frac{1-p_i}{1-p_iz^{-1}}

are proposed. Clearly $ g_i=g'_i\cdot(1-p_i)$. This form has the advantage that the dc gain is always $ H_i(1)=g'_i$ for all (stable) values of $ p_i$. Therefore, we can set $ g'_i$ to give a desired reverberation time at dc, and not have to change it when $ p_i$ is varied to modulate the high-frequency decay rate. As in the previous section, from Eq.$ \,$(3.9), we obtain

$\displaystyle g'_i \eqsp 10^{-3 M_i T / t_{60}(0)}.

A calculation given in [217] arrives at

$\displaystyle p_i \eqsp \frac{\mbox{ln}(10)}{4}\log_{10}(g_i)\left(1-\frac{1}{\alpha^2}\right)


$\displaystyle \alpha \isdef \frac{t_{60}(\pi/T)}{t_{60}(0)} \protect$ (4.10)

denotes the ratio of reverberation time at half the sampling rate divided by the reverberation time at dc.4.16

Multiband Delay-Filter Design

In §3.7.5, we derived first-order FDN delay-line filters which can independently set the reverberation time at dc and at half the sampling rate. However, perceptual studies indicate that reverberation time should be independently adjustable in at least three frequency bands [217]. To provide this degree of control (and more), one can implement a multiband delay-line filter using a general-purpose filter bank [370,500]. The output, say, of each delay line is split into $ K$ bands, where $ K\ge 3$ is recommended, and then, from Eq.$ \,$(3.6), the gain in the $ k$th band for a length $ M_i$ delay-line can be set to

$\displaystyle G^{M_i}(e^{j\omega_kT})\eqsp 10^{-\frac{3M_i}{n_{60}(\omega_k)}} ...
...ln}(10)\,M_i}{n_{60}(\omega_k)} \approxs

to produce the desired decay-time in that band, where $ n_{60}(\omega)=t_{60}(\omega)/T$ denotes the desired 60-dB decay time in samples. Faust implementations of FDN reverberator along these lines are described in §3.7.9 below.

Spectral Coloration Equalizer

In the previous section, a ``graphical equalizer'' was used to set the reverberator decay time independently in each spectral band slice. While this gives much control over decay time, there is no control over the initial spectral gain in each band. Therefore, another good place for a graphical equalizer is at the reverberator input or output. Such an equalizer allows control of the initial spectral coloration of the reverberator. In the example of Fig.3.10, a spectral coloration equalizer is most efficiently applied to the input signal $ u(n)$, before entering the FDN (but after splitting off the direct signal to be scaled by $ d$ and added to the output), or the output of $ E(z)$ in Fig.3.10.

Tonal Correction Filter

Let $ h_k(n)$ denote the component of the impulse response arising from the $ k$th pole of the system. Then the energy associated with that pole is

$\displaystyle {\cal E}_k \eqsp \sum_{n=0}^\infty \left\vert h_k(n)\right\vert^2.

All other factors being equal, if the decay time of the mode is shortened by half, it follows that the total energy contributed by that mode to the impulse response is also reduced by half. To compensate for this effect, Jot introduced a tonal correction filter $ E(z)$ to be placed in series with the FDN, as shown in Fig.3.10.

In the case of the first-order delay-line filters discussed in §3.7.5, good tonal correction is given by the following one-zero filter:

$\displaystyle E(z) \eqsp \frac{ 1 - bz^{-1}}{1-b}


$\displaystyle b \eqsp \frac{1-\alpha}{1+\alpha}

and $ \alpha$ is defined in Eq.$ \,$(3.10).

FDNs as Digital Waveguide Networks

As discussed in §C.15, the FDN using a Householder-reflection feedback matrix $ \mathbf{A}_N = \mathbf{I}_N -
(2/N)\underline{u}\underline{u}^T$ is equivalent to a network of $ N$ digital waveguides intersecting at a single scattering junction [463,464,385]. The wave impedance in the $ i$th waveguide is simply $ \underline{u}[i]$, the $ i$th element of the axis-of-reflection vector $ \underline{u}$. The choice $ \underline{u}^T=[1,1,\dots,1]$ corresponds to all of the waveguides having the same impedance (the ``isotrophic junction'' case).

FDN Reverberators in Faust

The Faust example reverb_designer.dsp brings up a $ 16\times
16$ FDN reverberator in which the signal out of each delay line is split into five bands so that $ t_{60}(\omega_k)$ can be controlled independently in each band. The 16 delay-line lengths are distributed exponentially between a minimum and maximum length set by two min/max-length sliders, but rounded to the nearest integer-power of a distinct prime, as introduced above in §3.7.3). The FDN reverberator is implemented in Faust's effect.lib. The band-splitting is carried out by the filterbank function in Faust's filter.lib.

The Faust function filterbank(order,freqs) implements a filter bank having the needed properties using Butterworth lowpass/highpass band-splitting arranged in a dyadic tree (normally a good choice for audio filter banks). That is, the whole spectrum is split at the highest crossover frequency, the lowpass region is then split into two bands at the next crossover frequency down, and so on, splitting the lowpass band at each stage in the dyadic tree [455,500]. The number of poles in each Butterworth lowpass/highpass filter is specified by order, and freqs contains a list of desired crossover frequencies separating the bands. A certain amount of dispersion is also introduced, since the filter bank is causal and delay-equalized (so that the bands may be summed without phase cancellation artifacts at the band edges). Also note that the lower bands are effectively produced by higher order filters than the upper bands. When the reverberation time is longer than the dispersion delay, the dispersion should not be audible as such, although it can affect the ``sound'' of the reverberation. In general, however, artificial reverberators normally benefit from additional allpass dispersion.

Figure 3.12 shows the block diagram of a $ 4\times4$ FDN reverberator made from Faust's reverb_designer.dsp by changing 16 to 4. Figure 3.13 shows the Faust block diagram of the associated $ 4\times4$ Hamard matrix multiplication. As it shows, multiplication by a Hadamard matrix can be implemented (ignoring the normalizing scale factor) as a series of block sums and differences (often called butterflies or shufflers) in which the block size decreases by a factor of 2 each stage. Figures for the remaining components of the reverberator may be perused via the shell command faust2firefox reverb_designer.dsp followed by clicking on the blocks in the browser.

Figure 3.12: FDN reverberator implemented in the Faust example reverb_designer.dsp, but scaled down from order 16 to order 4.

Figure 3.13: Hadamard processing used in the Faust FDN reverberator.


A FOSS4.17 reverberator that combines elements of Schroeder (§3.5) and FDN reverberators (§3.7) is zita-rev1,4.18written in C++ for Linux systems by Fons Adriaensen. A Faust version of the zita-rev1 stereo-mode functionality is zita_rev1 in Faust's effect.lib. A high-level block diagram appears in Fig.3.14.

Figure 3.14: High-level block diagram of zita_rev1_engine() in Faust's effect.lib generated by the shell script faust2firefox.

The main high-level addition relative to an 8th-order FDN reverberator is the block labeled allpass_combs in Fig.3.14. This block inserts a Schroeder allpass comb filter (Fig.2.30) in series with each delay line. In zita-rev1 (as of this writing), the allpass-comb feedforward/feedback coefficients are all set to $ \pm 0.6$. The delay-line lengths and other details are readily found in the freely available source code (or by browsing the Faust-generated block diagram).

Zita-Rev1 Delay-Line Filters

In zita-rev1, the damping filter for each delay line consists of a low-shelf filter $ H_l(z)$ [449],4.19in series with a unique first-order lowpass filter $ H_h(z)$ that sets the high-frequency $ t_{60}$ to be half that of the middle-band at a particular frequency $ f_h$ (specified as ``HF Damping'' in the GUI). Since the filter $ H_h$ is constrained to be a lowpass, $ t_{60}(f)<t_{60}(f_h)$ for $ f>f_h$, i.e., the decay time gets shorter at higher frequencies.

Viewing the resulting damping filter $ H_d(z)=H_l(z)H_h(z)$ as a three-band filter bank3.7.5), let $ g_0$ and $ g_m$ denote the desired band gains at dc and ``middle frequencies'', respectively.4.20 Then the low shelf may be set for a desired dc-gain of $ g_0/g_m$, and its input (or output) signal multiplied by $ g_m$ to obtain the resulting filter

$\displaystyle H_l(z) \eqsp g_m + (g_0-g_m)\frac{1-p_l}{2}\frac{1+z^{-1}}{1-p_lz^{-1}},

where $ p_l$ denotes the (real) first-order lowpass pole, given by [449]

$\displaystyle p_l \isdefs \frac{1-\pi f_1T}{1+\pi f_1T},

where $ f_1$ specifies (in Hz) the crossover point between ``low'' and ``middle'' frequencies, and $ T$ denotes the sampling interval as usual.

The lowpass filter $ H_h(z)$ is also first order, and to provide half the middle-band $ t_{60}$ at the beginning of the ``high'' band, the lowpass should ``break'' to a gain of $ g_m$ at the ``HF Damping'' frequency $ f_h$ specified in the GUI. A unity-dc-gain one-pole lowpass has the form [449]

$\displaystyle H_h(z) = \frac{1-p_h}{1-p_hz^{-1}},

where the pole $ p_h$ must be found to give a gain of $ g_M$ at frequency $ f_h$:

$\displaystyle \left\vert H_h\left(e^{j2\pi f_hT}\right)\right\vert \eqsp
\left\vert\frac{1-p_h}{1-p_he^{-j2\pi f_hT}}\right\vert \eqsp g_M

Squaring and normalizing yields a quadratic equation of the form $ p_h^2 + b\,p_h +1=0$. Solving for $ p_h$ using the quadratic formula yields

$\displaystyle p_h \eqsp -\frac{b}{2} - \sqrt{\left(\frac{b}{2}\right)^2 - 1},


$\displaystyle -\frac{b}{2} \eqsp \frac{1-g_M^2\cos(2\pi f_h T)}{1-g_M^2} > 1,

and the unstable solution $ -b/2 + \sqrt{(b/2)^2 - 1} > 1$ is discarded. To ensure $ \vert g_M\vert<1$, the GUI must limit the middle-band $ t_{60}$ to finite values. (The upper limit is presently $ 8$ seconds for both low and middle frequencies.)

Further Extensions

Schroeder's original structures for artificial reverberation were comb filters and allpass filters made from two comb filters. Since then, they have been upgraded to include specific early reflections and per-sample air-absorption filtering (Moorer, Schroeder), precisely specified frequency dependent reverberation time (Jot), and a nearly independent factorization of ``coloration'' and ``duration'' aspects (Jot). The evolution from comb filters to feedback delay networks (Gerzon, Stautner, Puckette, Jot) can be seen as a means for obtaining greater richness of feedback, so that the diffuseness of the impulse response is greater than what is possible with parallel and/or series comb filters. In fact, an FDN can be seen as a richly cross-coupled bank of feedback comb filters whenever the diagonal of the feedback matrix is nonzero. The question then becomes what aspects of artificial reverberation have not yet been fully addressed?

Spatialization of Reverberant Reflections

While we did not go into the subject here, the early reflections should be spatialized by including a head-related transfer function (HRTF) on each tap of the early-reflection delay line [248].4.21

Some kind of spatialization may be needed also for the late reverberation. A true diffuse field3.2.1) consists of a sum of plane waves traveling in all directions in 3D space. Since we do not know how to achieve this effect using current systems for reverberation, the typical goal is to simply extract uncorrelated outputs from the reverberation network and feed them to the various output channels, as discussed in §3.5. However, this is not ideal, since the resulting sound field consists of wavefronts arriving from each of the speakers, and it is possible for the reverberation to sound like it is emanating from discrete speaker locations. It may be that spatialization of some kind can better fool the ear into believing the late reverberation is coming from all directions.

Distribution of Mode Frequencies

Another way in which current reverberation systems are ``artificial'' is the unnaturally uniform distribution of resonant modes with respect to frequency. Because Schroeder, FDN, and waveguide reverbs are all essentially a collection of $ N$ delay lines with feedback around them, the modes tend to be distributed as the superposition of the resonant modes of $ N$ feedback comb filters. Since a feedback comb filter has a nearly harmonic set of modes (see §2.6.2), aggregates of comb filters tend to provide a uniform modal density in the frequency domain. In real reverberant spaces, the mode density increases as frequency squared, so it should be verified that the uniform modes used in a reverberator are perceptually equivalent to the increasingly dense modes in nature. Another aspect of perception to consider is that frequency-domain perception of resonances actually decreases with frequency. To summarize, in nature the modes get denser with frequency, while in perception they are less resolved, and in current reverberation systems they stay more or less uniform with frequency; perhaps a uniform distribution is a good compromise between nature and perception?

At low frequencies, however, resonant modes are accurately perceived in reverberation as boosts, resonances, and cuts. They are analogous to early reflections in the time domain, and we could call them the ``early resonances.'' It is interesting that no system for artificial reverberation except waveguide mesh reverberation (of which the author is aware) explicitly attempts precise shaping of the low-frequency amplitude response of a desired reverberant space, at least not directly. The low-frequency response is shaped indirectly by the choice of early reflections, and the use of parallel comb-filter banks in Schroeder reverberators serves also to shape the low-frequency response significantly. However, it would be possible to add filters for shaping more carefully the low-frequency response. Perhaps a reason for this omission is that hall designers work very hard to eliminate any explicit resonances or antiresonances in the response of a room. If uneven resonance at low frequencies is always considered a defect, then designing for a maximally uniform mode distribution, as has been discussed for the high-frequency modes, would be ideal also at low frequencies. Quite the opposite situation exists when designing ``small-box reverberators'' to simulate musical instrument resonators [428,203]; there, the low-frequency modes impart a characteristic timbre on the low-frequency resonance of the instrument (see Fig.3.2).

Digital Waveguide Reverberators

It was mentioned in §3.7.8 above that FDNs can be formulated as special cases of Digital Waveguide Networks (DWN) (see Appendix C for a fuller development of DWNs). Specifically, an FDN is obtained from a DWN consisting of a single scattering junctionC.15). It follows that the DWN paradigm provides a more generalized framework in which to pursue further improvements of reverberation architecture. For example, when multiple FDNs are embedded within a single DWN, it becomes possible to richly cross-couple them in an energy-controlled manner in order to create richer recursive structures than either alone. General DWNs were proposed for artificial reverberation in [430,433].

The Digital Waveguide Mesh for Reverberation

A special case of digital waveguide networks known as the digital waveguide mesh has also been proposed for use in artificial reverberation systems [396,518].

As discussed in §2.4, a digital waveguide (bidirectional delay line) can be considered a computational acoustic model for traveling waves in opposite directions. A mesh of such waveguides in 2D or 3D can simulate waves traveling in any direction in the space. As an analogy, consider a tennis racket in which a rectilinear mesh of strings forms a pseudo-membrane.

A major advantage of the waveguide mesh for reverberation applications is that wavefronts are explicitly simulated in all directions, as in real reverberant spaces. Therefore, a true diffuse field can be developed in the late reverberation. Also, the echo density grows with time and the mode density grows with frequency in a natural manner for the 2D and 3D mesh. Finally, the low-frequency modes of the reverberant space can be simulated very precisely (for better or worse).

The computational cost of a waveguide mesh is made tractable relative to more conventional finite-difference simulations by (1) the use of multiply-free scattering junctions and (2) very coarse meshes. Use of a coarse mesh means that the ``physical modeling'' aspects of the mesh are only valid at low frequencies. As practical matter, this works out well because the ear cannot hear mode tuning errors at high frequencies. There is no error in the mode dampings in a lossless reverberator prototype, because the waveguide mesh is lossless by construction. Therefore, the only errors relative to an ideal simulation of a lossless membrane or space are (1) mode tuning error, and (2) finite band width (cut off at half the sampling rate). The tuning error can be understood as due to dispersion of the traveling waves in certain directions [518,399]. Much progress has been made on the problem of correcting this dispersion error in various mesh geometries (rectilinear, triangular, tetrahedral, etc.) [521,398,399].

See §C.14 for an introduction to the digital waveguide mesh and a few of its properties.

Time Varying Reverberators

In real rooms, thermal convention currents cause the propagation path delays to vary over time [58]. Therefore, for greater physical accuracy, the delay lines within a digital reverberator should vary over time. From a more practical perspective, time variation helps to break up and obscure unwanted repetition in the late reverberation impulse response [430,104].

Next Section:
Delay-Line and Signal Interpolation
Previous Section:
Acoustic Modeling with Digital Delay