Perceptual Aspects of Reverberation

Artificial reverberation is an unusually interesting signal processing problem because, as discussed in the previous sections, the ``obvious'' methods based on physical modeling or input-output modeling are too expensive computationally for most applications. This leads to the question of what are the perceptually important aspects of reverberation, and how can these be provided by efficient computational structures.

Perception of Echo Density and Mode Density

The reverberation problem can be greatly simplified without sacrificing perceptual quality. For example, it can be shown4.3that for typical rooms, the echo density increases as $ t^2$, where $ t$ is time. Therefore, beyond some time, the echo density is so great that it can be modeled as some uniformly sampled stochastic process without loss of perceptual fidelity. In particular, there is no need to explicitly compute multiple echoes per sample of sound. For smoothly decaying late reverb (the desired kind), an appropriate random process sampled at the audio sampling rate will sound equivalent perceptually.

Similarly, it can be shown4.4that the number of resonant modes in any given frequency band increases as frequency squared, so that above some frequency, the modes are so dense that they are perceptually equivalent to a random frequency response generated according to some statistics. In particular, there is no need to explicitly implement resonances so densely packed that the ear cannot hear them all.

In summary, we see that, based on limits of perception, the impulse response of a reverberant room can be divided into two segments. The first segment, called the early reflections, consists of the relatively sparse first echoes in the impulse response. The remainder, called the late reverberation, is so densely populated with echoes that it is best to characterize the response statistically in some way. Section 3.3 discusses methods for simulating early reflections in the reverberation impulse response.

Similarly, the frequency response of a reverberant room can be divided into two segments. The low-frequency interval consists of a relatively sparse distribution of resonant modes, while at higher frequencies the modes are packed so densely that they are best characterized statistically as a random frequency response with certain (regular) statistical properties. Section 3.4 describes methods for synthesizing hiqh quality late reverberation.

Perceptual Metrics for Ideal Reverberation

Some desirable controls for an artificial reverberator include [218]

The time to decay 60 dB ($ t_{60}$) is a classical objective parameter used as a measure of perceived reverberation time. Classically, $ t_{60}$ was measured for the whole response. More recently [216], it has become more common to design for a given $ t_{60}$ at more than one frequency, e.g., one for low frequencies, another for high frequencies, and interpolated values at intermediate frequencies. Perceptual studies indicate that reverberation time should be independently adjustable in at least three frequency bands [217].

Energy Decay Curve

For measuring and defining reverberation time $ t_{60}$, Schroeder introduced the so-called energy decay curve (EDC) which is the tail integral of the squared impulse response at time $ t$:

$\displaystyle \hbox{EDC}(t) \isdef \int_t^\infty h^2(\tau)d\tau

Thus, $ \hbox{EDC}(t)$ is the total amount of signal energy remaining in the reverberator impulse response at time $ t$. The EDC decays more smoothly than the impulse response itself, and so it is more useful than ordinary amplitude envelopes for estimating $ t_{60}$.

Energy Decay Relief

The energy decay relief (EDR) is a time-frequency distribution which generalizes the EDC to multiple frequency bands [215]:

$\displaystyle \hbox{EDR}(t_n,f_k) \isdef \sum_{m=n}^M \left\vert H(m,k)\right\vert^2

where $ H(m,k)$ denotes bin $ k$ of the short-time Fourier transform (STFT) at time-frame $ m$ [12,451], and $ M$ denotes the total number of time frames. The FFT within the STFT is typically used with a window, such as a Hann window of length 30 or 40 ms.

Thus, $ \hbox{EDR}(t_n,f_k)$ is the total amount of signal energy remaining in the reverberator's impulse response at time $ t_n=nT$ in a frequency band centered about $ f_k=kf_s/N$ Hz, where $ N$ denotes the FFT length.

The EDR of a violin-body impulse response is shown in Fig.3.2. For better correspondence with audio perception, the frequency axis is warped to the Bark frequency scale [459], and energy is summed within each Bark band (one critical band of hearing equals one Bark). A violin body can be regarded as a very small reverberant room, with correspondingly ``magnified'' spectral structure relative to reverberant rooms.

Figure 3.2: Energy Decay Relief of a violin-body impulse response (from [203]).

The EDR of the Boston Symphony Hall is displayed in [153, p. 96].

The EDR is used to measure partial overtone dampings from recordings of a vibrating string in §6.11.5.

Next Section:
Early Reflections
Previous Section:
The Reverberation Problem