Delay-Line Interpolation

Free Books Physical Audio Signal Processing

As mentioned above, when an audio delay line needs to vary smoothly over time, some form of interpolation between samples is usually required to avoid ``zipper noise'' in the output signal as the delay length changes. There is a hefty literature on ``fractional delay'' in discrete-time systems, and the survey in [267] is highly recommended.

This section will describe the most commonly used cases. Linear interpolation is perhaps most commonly used because it is very straightforward and inexpensive, and because it sounds very good when the signal bandwidth is small compared with half the sampling rate. For a delay line in a nearly lossless feedback loop, such as in a vibrating string simulation, allpass interpolation is sometimes a better choice since it costs the same as linear interpolation in the first-order case and has no gain distortion. (Feedback loops can be very sensitive to gain distortions.) Finally, in later sections, some higher-order interpolation methods are described.

Linear Interpolation

Linear interpolation works by effectively drawing a straight line between two neighboring samples and returning the appropriate point along that line.

More specifically, let $\eta$ be a number between 0 and 1 which represents how far we want to interpolate a signal between time and time . Then we can define the linearly interpolated value $\hat y(n+\eta)$ as follows:

$\displaystyle \hat y(n+\eta) = (1-\eta) \cdot y(n) + \eta \cdot y(n+1) \protect$

(5.1)

For $\eta=0$ , we get exactly $\hat y(n)=y(n)$ , and for $\eta=1$ , we get exactly $\hat y(n+1)=y(n+1)$ . In between, the interpolation error $\left\vert\hat y(n+\eta)-y(n+\eta)\right\vert$ is nonzero, except when

happens to be a linear function between

and

One-Multiply Linear Interpolation

Note that by factoring out $\eta$ , we can obtain a one-multiply form,

$\displaystyle \hat y(n+\eta) = y(n) + \eta\cdot\left[y(n+1) - y(n)\right].$

Thus, the computational complexity of linear interpolation is one multiply and two additions per sample of output.

Fractional Delay Filtering by Linear Interpolation

**Figure 4.1:** Linearly interpolated delay line.
$\includegraphics[width=\twidth]{eps/delayli}$

A linearly interpolated delay line is depicted in Fig.4.1. In contrast to Eq. $\,$ (4.1), we interpolate linearly between times and , and $\eta$ is called the fractional delay in samples. The first-order (linear-interpolating) filter following the delay line in Fig.4.1 may be called a fractional delay filter [267]. Equation (4.1), on the other hand, expresses the more general case of an interpolated table lookup, where is regarded as a table of samples and $\hat y(n+\eta)$ is regarded as an interpolated table-lookup based on the samples stored at indices and .

The difference between a fractional delay filter and an interpolated table lookup is that table-lookups can ``jump around,'' while fractional delay filters receive a sequential stream of input samples and produce a corresponding sequential stream of interpolated output values. As a result of this sequential access, fractional delay filters may be recursive IIR digital filters (provided the desired delay does not change too rapidly over time). In contrast, ``random-access'' interpolated table lookups are typically implemented using weighted linear combinations, making them equivalent to nonrecursive FIR filters in the sequential case.^5.1

The C++ class implementing a linearly interpolated delay line in the Synthesis Tool Kit (STK) is called DelayL.

The frequency response of linear interpolation for fixed fractional delay ( $\eta$ fixed in Fig.4.1) is shown in Fig.4.2. From inspection of Fig.4.1, we see that linear interpolation is a one-zero FIR filter. When used to provide a fixed fractional delay, the filter is linear and time-invariant (LTI). When the fractional delay $\eta$ changes over time, it is a linear time-varying filter.

**Figure 4.2:** Linear interpolation frequency responses for delays between 0 and 1. Note the higher accuracy at low frequencies, reaching zero error at dc for all fractional delays.
$\includegraphics[width=\twidth]{eps/linear1}$

Linear interpolation sounds best when the signal is oversampled. Since natural audio spectra tend to be relatively concentrated at low frequencies, linear interpolation tends to sound very good at high sampling rates.

When interpolation occurs inside a feedback loop, such as in digital waveguide models for vibrating strings (see Chapter 6), errors in the amplitude response can be highly audible (particularly when the loop gain is close to 1, as it is for steel strings, for example). In these cases, it is possible to eliminate amplitude error (at some cost in delay error) by using an allpass filter for delay-line interpolation.

First-Order Allpass Interpolation

A delay line interpolated by a first-order allpass filter is drawn in Fig.4.3.

**Figure 4.3:** Allpass-interpolated delay line.
$\includegraphics[width=\twidth]{eps/delayai}$

Intuitively, ramping the coefficients of the allpass gradually ``grows'' or ``hides'' one sample of delay. This tells us how to handle resets when crossing sample boundaries.

The difference equation is

$\begin{eqnarray*} {\hat x}(n-\Delta) \isdef y(n) &=& \eta \cdot x(n) + x(n-1) - ... ...y(n-1) \\ &=& \eta \cdot \left[ x(n) - y(n-1)\right] + x(n-1). \end{eqnarray*}$

Thus, like linear interpolation, first-order allpass interpolation requires only one multiply and two adds per sample of output.

The transfer function is

$\displaystyle H(z) = \frac{\eta + z^{-1}}{1 + \eta z^{-1}}. \protect$

(5.2)

At low frequencies ( $z\to 1$ ), the delay becomes

$\displaystyle \Delta \approx \frac{1-\eta}{1+\eta} \protect$

(5.3)

Figure 4.4 shows the phase delay of the first-order digital allpass filter for a variety of desired delays at dc. Since the amplitude response of any allpass is 1 at all frequencies, there is no need to plot it.

**Figure 4.4:** Allpass-interpolation phase delay for a variety of desired delays (exact at dc).
$\includegraphics[width=\twidth]{eps/allpass1}$

The first-order allpass interpolator is generally controlled by setting its dc delay to the desired delay. Thus, for a given desired delay $\Delta$ , the allpass coefficient is (from Eq. $\,$ (4.3))

$\displaystyle \eta \approx \frac{1-\Delta}{1+\Delta}$

From Eq. $\,$ (4.2), we see that the allpass filter's pole is at $z=-\eta$ , and its zero is at $z=-1/\eta$ . A pole-zero diagram for $\Delta =0.1$ is given in Fig.4.5. Thus, zero delay is provided by means of a pole-zero cancellation! Due to inevitable round-off errors, pole-zero cancellations are to be avoided in practice. For this reason and others (discussed below), allpass interpolation is best used to provide a delay range lying wholly above zero, e.g.,

$\displaystyle \Delta\in[0.1,1.1] \;\longleftrightarrow\; \eta\in[-0.05,0.82]$

**Figure 4.5:** Allpass-interpolator pole-zero diagram for $\Delta =0.1$ .
$\includegraphics[width=4.8in]{eps/ap1pz}$

Note that, unlike linear interpolation, allpass interpolation is not suitable for ``random access'' interpolation in which interpolated values may be requested at any arbitrary time in isolation. This is because the allpass is recursive so that it must run for enough samples to reach steady state. However, when the impulse response is reasonably short, as it is for delays near one sample, it can in fact be used in ``random access mode'' by giving it enough samples with which to work.

The STK class implementing allpass-interpolated delay is DelayA.

Minimizing First-Order Allpass Transient Response

In addition to approaching a pole-zero cancellation at , another undesirable artifact appears as $\Delta\to0$ : The transient response also becomes long when the pole at $z=-\eta$ gets close to the unit circle.

A plot of the impulse response for $\Delta =0.1$ is shown in Fig.4.6. We see a lot of ``ringing'' near half the sampling rate. We actually should expect this from the nonlinear-phase distortion which is clearly evident near half the sampling rate in Fig.4.4. We can interpret this phenomenon as the signal components near half the sampling rate being delayed by different amounts than other frequencies, therefore ``sliding out of alignment'' with them.

**Figure 4.6:** Impulse response of the first-order allpass interpolator for $\Delta =0.1$ .
$\includegraphics[width=\twidth]{eps/ap1ir}$

For audio applications, we would like to keep the impulse-response duration short enough to sound ``instantaneous.'' That is, we do not wish to have audible ``ringing'' in the time domain near . For high quality sampling rates, such as larger than kHz, there is no issue of direct audibility, since the ringing is above the range of human hearing. However, it is often convenient, especially for research prototyping, to work at lower sampling rates where is audible. Also, many commercial products use such sampling rates to save costs.

Since the time constant of decay, in samples, of the impulse response of a pole of radius is approximately

$\displaystyle \frac{\tau}{T} \approx \frac{1}{1-R},$

and since a 60-dB decay occurs in about 7 time constants (`` $t_{60}$ '') [451, p. 38], we can limit the pole of the allpass filter to achieve any prescribed specification on maximum impulse-response duration.

For example, suppose 100 ms is chosen as the maximum $t_{60}$ allowed at a sampling rate of . Then applying the above constraints yields $\eta\leq 0.993$ , corresponding to the allowed delay range .

Linear Interpolation as Resampling

Convolution Interpretation

Linearly interpolated fractional delay is equivalent to filtering and resampling a weighted impulse train (the input signal samples) with a continuous-time filter having the simple triangular impulse response

$\displaystyle h_l(t) = \left\{\begin{array}{ll} 1-\left\vert t/T\right\vert, & ... ...ght\vert\leq T, \\ [5pt] 0, & \hbox{otherwise}. \\ \end{array} \right. \protect$

(5.4)

Convolution of the weighted impulse train with

produces a continuous-time linearly interpolated signal

$\displaystyle x(t) = \sum_{n=-\infty}^{\infty} x(nT) h_l(t-nT). \protect$

(5.5)

This continuous result can then be resampled at the desired fractional delay.

In discrete time processing, the operation Eq. $\,$ (4.5) can be approximated arbitrarily closely by digital upsampling by a large integer factor , delaying by samples (an integer), then finally downsampling by , as depicted in Fig.4.7 [96]. The integers and are chosen so that $\eta \approx L/M$ , where $\eta$ the desired fractional delay.

**Figure 4.7:** Linear interpolation as a convolution.
$\includegraphics[width=0.8\twidth]{eps/polyphaseli}$

The convolution interpretation of linear interpolation, Lagrange interpolation, and others, is discussed in [407].

Frequency Response of Linear Interpolation

Since linear interpolation can be expressed as a convolution of the samples with a triangular pulse, we can derive the frequency response of linear interpolation. Figure 4.7 indicates that the triangular pulse serves as an anti-aliasing lowpass filter for the subsequent downsampling by . Therefore, it should ideally ``cut off'' all frequencies higher than .

Triangular Pulse as Convolution of Two Rectangular Pulses

The 2-sample wide triangular pulse (Eq. $\,$ (4.4)) can be expressed as a convolution of the one-sample rectangular pulse with itself.

**Figure 4.8:** The width rectangular pulse.
$\includegraphics{eps/rectpulse}$

The one-sample rectangular pulse is shown in Fig.4.8 and may be defined analytically as

$\displaystyle p_T(t) \isdef u\left(t+\frac{T}{2}\right) - u\left(t-\frac{T}{2}\right),$

where

is the Heaviside unit step function:

$\displaystyle u(t) \isdef \left\{\begin{array}{ll} 1, & t\geq 0 \\ [5pt] 0, & t<0 \\ \end{array}\right..$

Convolving

with itself produces the two-sample triangular pulse

$\displaystyle h_l(t) = (p_T\ast p_T)(t) \isdef \int_{-\infty}^{\infty} p_T(\tau)p_T(t-\tau)d\tau$

While the result can be verified algebraically by substituting

for

, it seen more quickly via graphical convolution.

Linear Interpolation Frequency Response

Since linear interpolation is a convolution of the samples with a triangular pulse $h_l(t) = (p_T\ast p_T)(t)$ (from Eq. $\,$ (4.5)), the frequency response of the interpolation is given by the Fourier transform , which yields a sinc function. This frequency response applies to linear interpolation from discrete time to continuous time. If the output of the interpolator is also sampled, this can be modeled by sampling the continuous-time interpolation result in Eq. $\,$ (4.5), thereby aliasing the sinc frequency response, as shown in Fig.4.9.

**Figure:** (a) $\protect$ sinc $^2(fT) = {\cal F}_f^2\{p_T/T\}$ .
(b) $\protect\alias _{10}\{\protect$ sinc $^2(fT)\} \propto \protect\dtft \{\protect\sample _{T/10}\{p_T\ast p_T\}\}$ .
$\includegraphics[width=3in]{eps/sincsquared}$

In slightly more detail, from $h_l(t) = (p_T\ast p_T)(t)$ , and ${\cal F}_f\{p_T\}= T$ sinc, we have

$\displaystyle H_l(f) \isdef {\cal F}_f\{h_l\} = {\cal F}_f\{p_T\ast p_T\} = {\cal F}^2_f\{p_T\} = T^2$ sinc $\displaystyle ^2(fT)$

where we used the convolution theorem for Fourier transforms, and the fact that

sinc $( f T) = \sin(\pi f T) / (\pi f)$ .

The Fourier transform of is the same function aliased on a block of size Hz. Both and its alias are plotted in Fig.4.9. The example in this figure pertains to an output sampling rate which is times that of the input signal. In other words, the input signal is upsampled by a factor of using linear interpolation. The ``main lobe'' of the interpolation frequency response contains the original signal bandwidth; note how it is attenuated near half the original sampling rate ( in Fig.4.9). The ``sidelobes'' of the frequency response contain attenuated copies of the original signal bandwidth (see the DFT stretch theorem), and thus constitute spectral imaging distortion in the final output (sometimes also referred to as a kind of ``aliasing,'' but, for clarity, that term will not be used for imaging distortion in this book). We see that the frequency response of linear interpolation is less than ideal in two ways:

The spectrum is ``rolled'' off near half the sampling rate. In fact, it is nowhere flat within the ``passband'' (-1 to 1 in Fig.4.9).
Spectral imaging distortion is suppressed by only 26 dB (the level of the first sidelobe in Fig.4.9.

These qualitative remarks apply to all upsampling factors $L\geq 2$ using linear interpolation. The case

is considered in the next section.

Special Cases

In the limiting case of , the input and output sampling rates are equal, and all sidelobes of the frequency response (partially shown in Fig.4.9) alias into the main lobe.

If the output is sampled at the same exact time instants as the input signal, the input and output are identical. In terms of the aliasing picture of the previous section, the frequency response aliases to a perfect flat response over $fT\in[-1,1]$ , with all spectral images combining coherently under the flat gain. It is important in this reconstruction that, while the frequency response of the underlying continuous interpolating filter is aliased by sampling, the signal spectrum is only imaged--not aliased; this is true for all positive integers and in Fig.4.7.

More typically, when linear interpolation is used to provide fractional delay, identity is not obtained. Referring again to Fig.4.7, with considered to be so large that it is effectively infinite, fractional-delay by $\tau<1$ can be modeled as convolving the samples with $h_l(t-\tau)$ followed by sampling at . In this case, a linear phase term has been introduced in the interpolator frequency response, giving,

$\displaystyle H_\tau(f) \isdef e^{-j\tau 2\pi f} H_l(f)$

so that now downsampling to the original sampling rate does not yield a perfectly flat amplitude response for $fT\in[-1,1]$ (when $\tau$ is non-integer). Moreover, the phase response is nonlinear as well; a sampled symmetric triangular pulse is only linear phase when the samples fall symmetrically about the midpoint. Some example frequency-responses for various delays $\tau$ are graphed in Fig.4.2.

Large Delay Changes

When implementing large delay length changes (by many samples), a useful implementation is to cross-fade from the initial delay line configuration to the new configuration:

Computational requirements are doubled during the cross-fade.
The cross-fade should occur over a time interval long enough to yield a smooth result.
The new delay interpolation filter, if any, may be initialized in advance of the cross-fade, for maximum smoothness. Thus, if the transient response of the interpolation filter is samples, the new delay-line + interpolation filter can be ``warmed up'' (executed) for time steps before beginning the cross-fade. If the cross-fade time is long compared with the interpolation filter duration, ``pre-warming'' is not necessary.
This is not a true ``morph'' from one delay length to another since we do not pass through the intermediate delay lengths. However, it avoids a potentially undesirable Doppler effect.
A single delay line can be shared such that the cross-fade occurs from one read-pointer (plus associated filtering) to another.

Next Section:
Lagrange Interpolation
Previous Section:
FDN Reverberation