For linear time-invariant systems, rather than build an explicit discrete-time model as in §7.3 for each mass, spring, and dashpot, (or inductor, capacitor, and resistor for virtual analog models), we may instead choose to model only the transfer function between selected inputs and outputs of the physical system. It should be emphasized that this is an option only when the relevant portion of the system is linear and time invariant (LTI), or at least sufficiently close to LTI. Transfer-function modeling can be considered a kind of ``large-scale'' or ``macroscopic'' modeling in which an entire physical subsystem, such as a guitar body, is modeled by a single transfer function relating specific inputs and outputs. (A transfer function can also of course be a matrix relating a vector of inputs to a vector of outputs .) Such models are used extensively in the field of control system design .9.1
Transfer-function modeling is often the most cost-effective way to incorporate LTI lumped elements (Ch. 7) in an otherwise physical computational model. For wave-propagating distributed systems, on the other hand, such as vibrating strings and acoustic tubes, digital waveguides models (Ch. 6) are more efficient than transfer-function models, in addition to having a precise physical interpretation that transfer-function coefficients lack. In models containing lumped elements, or distributed components that are not characterized by wave propagation, maximum computational efficiency is typically obtained by deciding which LTI portions of the model can be ``frozen'' as ``black boxes'' characterized only by their transfer functions. In return for increased computational efficiency, we sacrifice the ability to access the interior of the black box in a physically meaningful way.
An example where such ``macroscopic'' transfer-function modeling is normally applied is the trumpet bell (§9.7.2). A fine-grained model might use a piecewise cylindrical or piecewise conical approximation to the flaring bell . However, there is normally no need for an explicit bell model in a practical virtual instrument when the bell is assumed to be LTI and spatial directivity variations are neglected. In such cases, the transmittance and reflectance of the bell can be accurately summarized by digital filters having frequency responses that are optimal approximations to the measured (or theoretical) bell responses (§9.7). However, it is then not so easy to insert a moveable virtual ``mute'' into the bell reflectance/transmittance filters. This is an example of the general trade-off between physical extensibilty and computational efficiency/parsimony.
- Sampling to get
- Pole mappings (such as followed by Prony's method)
- Modal expansion
- Frequency-response matching using digital filter design methods
Next, we'll look at the more specialized technique known as commuted synthesis, in which computational efficiency may be greatly increased by interchanging (``commuting'') the series order of component transfer functions. Commuted synthesis delivers large gains in efficiency for systems with a short excitation and high-order resonators, such plucked and struck strings. In Chapter 9, commuted synthesis is applied to piano modeling.
Extracting the least-damped modes of a transfer function for separate parametric implementation is often used in commuted synthesis. We look at a number of ways to accomplish this goal toward the end of this chapter.
We close the chapter with a simple example of transfer-function modeling applied to the digital phase shifter audio effect. This example classifies as virtual analog modeling, in which a valued analog device is converted to digital form in a way that preserves all valued features of the original. Further examples of transfer-function models appear in Chapter 9.
Sampling the impulse response of a system is of course quite elementary. The main thing to watch out for is aliasing, and the main disadvantage is high computational complexity when the impulse-response is long.
Since we have defined (in §7.2) the driving-point admittance as the nominal transfer function of a system port, corresponding to defining the input as driving force and the output as resulting velocity (see Fig.7.3), we have that is defined as the system impulse response. Note, however, that the driving force and observed velocity need not be at the same physical point, and in general we may freely define any physical input and output points. Nevertheless, if the outputs are in velocity units and the inputs are in force units, then the transfer-function matrix will have units of admittance, and we will assume this for simplicity.
Sampling the impulse response can be expressed mathematically as .9.2 In practice, we can only record a finite number of impulse-response samples. Usually a graceful taper (e.g., using the right half of an FFT window, such as the Hann window) yields better results than simple truncation. The system model is then implemented as a Finite Impulse Response (FIR) digital filter (§2.5.4). The next section describes the related impulse-invariant method for digital filter design which derives an infinite impulse response (IIR) digital filter that matches the analog filter impulse response exactly at the sampling times.
Sampling the impulse response has the advantage of preserving resonant frequencies (see next section), but its big disadvantage is aliasing of the frequency response. No ``system'' is truly bandlimited. For example, even a simple mass and dashpot with a nonzero initial condition produces a continuous decaying exponential response that is not bandlimited.
Before a continuous impulse response is sampled, a lowpass filter should be considered for eliminating all frequency components at half the sampling rate and above. In other words, the system itself should be ``lowpassed'' to avoid aliasing in many applications. (On the other hand, there are also many applications in which the frequency-response aliasing is not objectional to the ear.) If the system is linear and time invariant, and if we excite the system with input signals and initial conditions that are similarly bandlimited to less than half the sampling rate, no signal inside the system or appearing at the outputs will be aliased. In other words, these conditions yield an ideal bandlimited system simulation that remains exact (for the bandlimited signals) at the sampling instants.
Note, however, that time variation or nonlinearity (both common in practical instruments), together with feedback, will ``pump'' the signal spectrum higher and higher until aliasing is ultimately encountered (see §6.13). For this reason, feedback loops in the digital system may need additional lowpass filtering to attenuate newly generated high frequencies.
A sampled impulse response is an example of a nonparametric representation of a linear, time-invariant system. It is not usually regarded as a physical model, even when the impulse-response samples have a physical interpretation (such as when no anti-aliasing filter is used).
Impulse Invariant Method
The impulse-invariant method converts analog filter transfer functions to digital filter transfer functions in such a way that the impulse response is the same (invariant) at the sampling instants , [362, pp. 216-219]. Thus, if denotes the impulse-response of an analog (continuous-time) filter, then the digital (discrete-time) filter given by the impulse-invariant method will have impulse response , where denotes the sampling interval in seconds. Moreover, the order of the filter is preserved, and IIR analog filters map to IIR digital filters. However, the digital filter's frequency response is an aliased version of the analog filter's frequency response.9.3
To derive the impulse-invariant method, we begin with the analog transfer function
and perform a partial fraction expansion (PFE) down to first-order terms :9.4
and the residues have remained unchanged. Clearly we must have , i.e., the analog poles must lie within the bandwidth spanned by the digital sampling rate . Otherwise, the pole angle will be aliased into the interval . Stability is preserved since .
Note that the series combination of two digital filters designed by the impulse-invariant method is not impulse invariant. In other terms, the convolution of two sampled analog signals is not the same as the sampled convolution of those analog signals. This is easy to see when aliasing is considered. For example, let one signal be the impulse response of an ideal lowpass filter cutting off below half the sampling rate. Then this signal will not alias when sampled, and its convolution with any second signal will similarly not alias when sampled. However, if the second signal does alias upon sampling, then this aliasing is gone when the convolution precedes the sampling, and the results cannot be the same in the two cases.
The matched z transformation uses the same pole-mapping Eq.(8.2) as in the impulse-invariant method, but the zeros are handled differently. Instead of only mapping the poles of the partial fraction expansion and letting the zeros fall where they may, the matched z transformation maps both the poles and zeros in the factored form of the transfer function [362, pp. 224-226].
The factored form  of a transfer function
can be written as
The matched z transformation is carried out by replacing each first-order term of the form by its digital equivalent , i.e.,
Thus, the matched z transformation normally yields different digital zeros than the impulse-invariant method. The impulse-invariant method is generally considered superior to the matched z transformation .
Relation to Finite Difference Approximation
which is the FDA definition in the frequency domain given in Eq.(7.3).
Since the FDA equals the match z transformation for the point , it maps analog dc () to digital dc () exactly. However, that is the only point on the frequency axis that is perfectly mapped, as shown in Fig.7.15.
Pole Mapping with Optimal Zeros
We saw in the preceding sections that both the impulse-invariant and the matched- transformations map poles from the left-half plane to the interior of the unit circle in the plane via
where is the location of the th pole in the plane (assumed to lie in the strip to avoid aliasing). The zeros, on the other hand, were different because the impulse-invariant method started with the partial fraction expansion while the matched- transformation started with the factored form of the transfer function.
It is hard to do better Eq.(8.8) as a pole mapping from to , when aliasing is avoided, because it preserves both the resonance frequency and bandwidth for a complex pole . Therefore, good practical modeling results can be obtained by optimizing the zeros (residues) to achieve audio criteria given these fixed poles. Alternatively, only the least-damped poles need be constrained in this way, e.g., to fix and preserve the most important resonances of a stringed-instrument body or acoustic space.
A well known approach to transfer-function modeling is called modal synthesis, introduced in §1.3.9 [5,299,6,145,381,30]. Modal synthesis may be defined as constructing a source-filter synthesis model in which the filter transfer function is implemented as a sum of first- and/or second-order filter sections (i.e., as a parallel filter bank in which each filter is at most second-order--this was reviewed in §1.3.9). In other words, the physical system is represented as a superposition of individual modes driven by some external excitation (such as a pluck or strike).
In acoustics, the term mode of vibration, or normal mode, normally refers to a single-frequency spatial eigensolution of the governing wave equation. For example, the modes of an ideal vibrating string are the harmonically related sinusoidal string-shapes having an integer number of uniformly spaced zero-crossings (or nodes) along the string, including its endpoints. As first noted by Daniel Bernoulli (§A.2), acoustic vibration can be expressed as a superposition of component sinusoidal vibrations, i.e., as a superposition of modes.9.6
When a single mode is excited by a sinusoidal driving force, all points of the physical object vibrate at the same temporal frequency (cycles per second), and the mode shape becomes proportional to the spatial amplitude envelope of the vibration. The sound emitted from the top plate of a guitar, for example, can be represented as a weighted sum of the radiation patterns of the respective modes of the top plate, where the weighting function is constructed according to how much each mode is excited (typically by the guitar bridge) [143,390,109,205,209].
The impulse-invariant method (§8.2), can be considered a special case of modal synthesis in which a continuous-time -plane transfer function is given as a starting point. More typically, modal synthesis starts with a measured frequency response, and a second-order parallel filter bank is fit to that in some way. In particular, any filter-design technique may be used (§8.6), followed by a conversion to second-order parallel form.
Modal expansions find extensive application in industry for determining parametric frequency responses (superpositions of second-order modes) from measured vibration data . Each mode is typically parametrized in terms of its resonant frequency, bandwidth (or damping), and gain (most generally complex gain, to include phase).
Modal synthesis can also be seen as a special case of source-filter synthesis, which may be defined as any signal model based on factoring a sound-generating process into a filter driven by some (usually relatively simple) excitation signal. An early example of source-filter synthesis is the vocoder (§A.6.1). Whenever the filter in a source-filter model is implemented as a parallel second-order filter bank, we can call it modal synthesis (although, strictly speaking, each filter section should correspond to a resonant mode of the modeled resonant system).
Also related to modal synthesis is so-called formant synthesis, used extensively in speech synthesis [219,255,389,40,81,491,490,313,363,297]. A formant is simply a filter resonance having some center-frequency, bandwidth, and (sometimes specified) gain. Thus, a formant corresponds to a single mode of vibration in the vocal tract. Many text-to-speech systems in use today are based on the Klatt formant synthesizer for speech .
Since the importance of spectral formants in sound synthesis has more to do with the way we hear than with the physical parameters of a system, formant synthesis is probably best viewed as a spectral modeling synthesis method , as opposed to a physical modeling technique . An exception to this rule may occur when the system consists physically of a parallel bank of second-order resonators, such as an array of tuning forks or Helmholtz resonators. In such a case, the mode parameters correspond to physically independent objects; this is of course rare in practice.
In the linear, time-invariant case, only modes in the range of human hearing need to be retained in the model. Also, any ``uncontrollable'' or ``unobservable'' modes should obviously be left out as well. With these simplificiation, the modal representation is generally more efficient than an explicit mass-spring-dashpot digitization (§7.3). On the other hand, since the modal representation is usually not directly physical, nonlinear extensions may be difficult and/or behave unnaturally.
As in the impulse-invariant method (§8.2), starting with an order transfer function describing the input-output behavior of a physical system, a modal description can be obtained immediately from the partial fraction expansion (PFE):
where denotes the th pole, is the th zero, and is the residue of the th pole . (For simplicity of notation, Eq.(8.9) is written for the case and distinct. See  for the details when this is not the case.) Since the system is real, complex poles will occur in complex conjugate pairs. This means that for each complex term , the PFE will also include the complex-conjugate term . Such terms can be summed to create half as many real second-order terms:
Let denote the number of complex pole pairs, and be the number of real poles. Then the modal expansion can be written as
The real poles, if any, can be paired off into a sum of second-order sections as well, with one first-order section left over when is odd. Note that each component second-order mode consists of one zero and two poles. The zero is necessary to adjust the phase of the resonant mode. The mode phase affects the frequency response at frequencies between resonances (from smoothly varying to the appearance of a deep notch somewhere between them).
A typical practical implementation of each mode in the sum is by means of second-order filter, or biquad section . A biquad section is easily converted from continuous- to discrete-time form, such as by the impulse-invariant method of §8.2. The final discrete-time model then consist of a parallel bank of second-order digital filters
where when the number of real poles is even, or otherwise. The filter coefficients and are now digital second-order section coefficients which can be computed from the continuous-time filter coefficients, if they are known, or they can be directly estimated from discrete-time data.
Expressing complex poles of in polar form as
, (where now we assume denotes a pole in the
plane), we obtain the classical relations 
These parameters are in turn related to the modal parameters resonance frequency (Hz) and bandwidth (Hz) by
Often and are estimated from a measured log-magnitude frequency-response (see §8.8 for a discussion of various methods).
It remains to determine the overall section gain in Eq.(8.12), and the coefficient determining the zero location. If (as is common in practice, since the human ear is not very sensitive to spectral nulls), then can be chosen to match the measured peak height. A particularly simple automatic method is to use least-squares minimization to compute using Eq.(8.12) with and fixed (and ). Generally speaking, least-squares minimization is always immediate when the error to be minimized is linear in the unknown parameter (in this case ).
To estimate in Eq.(8.12), an effective method is to work initially with a complex, first-order partial fraction expansion, and use least squares to determine complex gain coefficients for first-order (complex) terms of the form
Sections implementing two real poles are not conveniently specified in terms of resonant frequency and bandwidth. In such cases, it is more common to work with the filter coefficients and directly, as they are often computed by system-identification software [288,428] or general purpose filter design utilities, such as in Matlab or Octave .
To summarize, numerous techniques exist for estimating modal parameters from measured frequency-response data, and some of these are discussed in §8.8 below. Specifically, resonant modes can be estimated from the magnitude, phase, width, and location of peaks in the Fourier transform of a recorded (or estimated) impulse response Another well known technique is Prony's method [449, p. 393], . There is also a large literature on other methods for estimating the parameters of exponentially decaying sinusoids; examples include the matrix-pencil method , and parametric spectrum analysis in general .
State Space Approach to Modal Expansions
The preceding discussion of modal synthesis was based primarily on fitting a sum of biquads to measured frequency-response peaks. A more general way of arriving at a modal representation is to first form a state space model of the system , and then convert to the modal representation by diagonalizing the state-space model. This approach has the advantage of preserving system behavior between the given inputs and outputs. Specifically, the similarity transform used to diagonalize the system provides new input and output gain vectors which properly excite and observe the system modes precisely as in the original system. This procedure is especially more convenient than the transfer-function based approach above when there are multiple inputs and outputs. For some mathematical details, see 9.7For a related worked example, see §C.17.6.
When a subset of the resonating modes are nearly harmonically tuned, it can be much more computationally efficient to use a filtered delay loop (see §2.6.5) to generate an entire quasi-harmonic series of modes rather than using a biquad for each modal peak . In this case, the resonator model becomes
Frequency-Response Matching Using
Digital Filter Design Methods
where we assume . Similarly, point-to-point ``trans-admittances'' can be defined as the velocity Laplace transform at one point on the physical object divided by the driving-force Laplace transform at some other point. There is also of course no requirement to always use driving force and observed velocity as the physical variables; velocity-to-force, force-to-force, velocity-to-velocity, force-to-acceleration, etc., can all be used to define transfer functions from one point to another in the system. For simplicity, however, we will prefer admittance transfer functions here.
Ideal Differentiator (Spring Admittance)
Figure 8.1 shows a graph of the frequency response of the ideal differentiator (spring admittance). In principle, a digital differentiator is a filter whose frequency response optimally approximates for between and . Similarly, a digital integrator must match along the unit circle in the plane. The reason an exact match is not possible is that the ideal frequency responses and , when wrapped along the unit circle in the plane, are not ``smooth'' functions any more (see Fig.8.1). As a result, there is no filter with a rational transfer function (i.e., finite order) that can match the desired frequency response exactly.
The discontinuity at alone is enough to ensure that no finite-order digital transfer function exists with the desired frequency response. As with bandlimited interpolation (§4.4), it is good practice to reserve a ``guard band'' between the highest needed frequency (such as the limit of human hearing) and half the sampling rate . In the guard band , digital filters are free to smoothly vary in whatever way gives the best performance across frequencies in the audible band at the lowest cost. Figure 8.2 shows an example. Note that, as with filters used for bandlimited interpolation, a small increment in oversampling factor yields a much larger decrease in filter cost (when the sampling rate is near ).
In the general case of Eq.(8.14) with , digital filters can be designed to implement arbitrarily accurate admittance transfer functions by finding an optimal rational approximation to the complex function of a single real variable
Digital Filter Design Overview
This section (adapted from ), summarizes some of the more commonly used methods for digital filter design aimed at matching a nonparametric frequency response, such as typically obtained from input/output measurements. This problem should be distinguished from more classical problems with their own specialized methods, such as designing lowpass, highpass, and bandpass filters [343,362], or peak/shelf equalizers [559,449], and other utility filters designed from a priori mathematical specifications.
The problem of fitting a digital filter to a prescribed frequency response may be formulated as follows. To simplify, we set .
with given, such that some norm of the error
is minimum with respect to the filter coefficients
The filter-design problem is then to find a (strictly) stable -pole, -zero digital filter which minimizes some norm of the error in the frequency-response. This is fundamentally rational approximation of a complex function of a real (frequency) variable, with constraints on the poles.
While the filter-design problem has been formulated quite naturally, it is difficult to solve in practice. The strict stability assumption yields a compact space of filter coefficients , leading to the conclusion that a best approximation exists over this domain.9.9Unfortunately, the norm of the error typically is not a convex9.10function of the filter coefficients on . This means that algorithms based on gradient descent may fail to find an optimum filter due to their premature termination at a suboptimal local minimum of .
Fortunately, there is at least one norm whose global minimization may be accomplished in a straightforward fashion without need for initial guesses or ad hoc modifications of the complex (phase-sensitive) IIR filter-design problem--the Hankel norm [155,428,177,36]. Hankel norm methods for digital filter design deliver a spontaneously stable filter of any desired order without imposing coefficient constraints in the algorithm.
An alternative to Hankel-norm approximation is to reformulate the problem, replacing Eq.(8.17) with a modified error criterion so that the resulting problem can be solved by linear least-squares or convex optimization techniques. Examples include
- Pseudo-norm minimization:
(Pseudo-norms can be zero for nonzero functions.)
For example, Padé approximation falls in this category.
In Padé approximation, the first
of the impulse-response of are matched exactly,
and the error in the remaining impulse-response samples is ignored.
- Ratio Error: Minimize
. Minimizing the norm of the ratio error
yields the class of methods known as linear prediction
techniques [20,296,297]. Since,
by the definition of a norm, we have
it follows that
ratio error methods ignore the phase of the
approximation. It is also evident that ratio error is
minimized by making
.9.11 For this reason, ratio-error
methods are considered most appropriate for modeling the
spectral envelope of
. It is well known
that these methods are fast and exceedingly robust in
practice, and this explains in part why they are used almost
exclusively in some data-intensive applications such as speech
modeling and other spectral-envelope applications. In some
applications, such as adaptive control or forecasting, the
fact that linear prediction error is minimized can justify
- Equation error: Minimize
The above expression makes it clear that equation-error can be seen as a frequency-response error weighted by . Thus, relatively large errors can be expected where the poles of the optimum approximation (roots of ) approach the unit circle . While this may make the frequency-domain formulation seem ill-posed, in the time-domain, linear prediction error is minimized in the sense, and in certain applications this is ideal. (Equation-error methods thus provide a natural extension of ratio-error methods to include zeros.) Using so-called Steiglitz-McBride iterations [287,449,288], the equation-error solution iteratively approaches the norm-minimizing solution of Eq.(8.17) for the L2 norm.
Examples of minimizing equation error using the matlab function invfreqz are given in §8.6.3 and §8.6.4 below. See [449, Appendix I] (based on [428, pp. 48-50]) for a discussion of equation-error IIR filter design and a derivation of a fast equation-error method based on the Fast Fourier Transform (FFT) (used in invfreqz).
- Conversion to real-valued approximation: For
example, power spectrum matching, i.e., minimization of
, is possible
using the Chebyshev or
norm . Similarly,
linear-phase filter design can be carried out with some
guarantees, since again the problem reduces to real-valued
approximation on the unit circle. The essence of these
methods is that the phase-response is eliminated from
the error measure, as in the norm of the ratio error, in order
to convert a complex approximation problem into a real one.
Real rational approximation of a continuous curve appears to
be solved in principle only under the
- Decoupling poles and zeros: An effective example of
this approach is Kopec's method  which consists of
using ratio error to find the poles, computing the error
, inverting it, and fitting poles again (to
). There is a wide variety of methods which first
fit poles and then zeros. None of these methods produce
optimum filters, however, in any normal sense.
In addition to these modifications, sometimes it is necessary to reformulate the problem in order to achieve a different goal. For example, in some audio applications, it is desirable to minimize the log-magnitude frequency-response error. This is due to the way we hear spectral distortions in many circumstances. A technique which accomplishes this objective to the first order in the norm is described in .
Sometimes the most important spectral structure is confined to an interval of the frequency domain. A question arises as to how this structure can be accurately modeled while obtaining a cruder fit elsewhere. The usual technique is a weighting function versus frequency. An alternative, however, is to frequency-warp the problem using a first-order conformal map. It turns out a first-order conformal map can be made to approximate very well frequency-resolution scales of human hearing such as the Bark scale or ERB scale . Frequency-warping is especially valuable for providing an effective weighting function connection for filter-design methods, such as the Hankel-norm method, that are intrinsically do not offer choice of a weighted norm for the frequency-response error.
There are several methods which produce instead of directly. A fast spectral factorization technique is useful in conjunction with methods of this category . Roughly speaking, a size polynomial factorization is replaced by an FFT and a size system of linear equations.
We saw the ideal digital differentiator frequency response in Fig.8.1, where it was noted that the discontinuity in the response at made an ideal design unrealizable (infinite order). Fortunately, such a design is not even needed in practice, since there is invariably a guard band between the highest supported frequency and half the sampling rate .
Figure 8.2 illustrates a more practical design specification for the digital differentiator as well as the performance of a tenth-order FIR fit using invfreqz (which minimizes equation error) in Octave.9.12 The weight function passed to invfreqz was 1 from 0 to 20 kHz, and zero from 20 kHz to half the sampling rate (24 kHz). Notice how, as a result, the amplitude response follows that of the ideal differentiator until 20 kHz, after which it rolls down to a gain of 0 at 24 kHz, as it must (see Fig.8.1). Higher order fits yield better results. Using poles can further improve the results, but the filter should be checked for stability since invfreqz designs filters in the frequency domain and does not enforce stability.9.13
Fitting Filters to Measured Amplitude Responses
The preceding filter-design example digitized an ideal differentiator, which is an example of converting an LTI lumped modeling element into a digital filter while maximally preserving its frequency response over the audio band. Another situation that commonly arises is the need for a digital filter that matches a measured frequency response over the audio band.
Figure 8.3 shows a plot of simulated amplitude-response measurements at 10 frequencies equally spread out between 100 Hz and 3 kHz on a log frequency scale. The ``measurements'' are indicated by circles. Each circle plots, for example, the output amplitude divided by the input amplitude for a sinusoidal input signal at that frequency . These ten data points are then extended to dc and half the sampling rate, interpolated, and resampled to a uniform frequency grid (solid line in Fig.8.3), as needed for FFT processing. The details of these computations are listed in Fig.8.4. We will fit a four-pole, one-zero, digital-filter frequency-response to these data.9.14
NZ = 1; % number of ZEROS in the filter to be designed NP = 4; % number of POLES in the filter to be designed NG = 10; % number of gain measurements fmin = 100; % lowest measurement frequency (Hz) fmax = 3000; % highest measurement frequency (Hz) fs = 10000; % discrete-time sampling rate Nfft = 512; % FFT size to use df = (fmax/fmin)^(1/(NG-1)); % uniform log-freq spacing f = fmin * df .^ (0:NG-1); % measurement frequency axis % Gain measurements (synthetic example = triangular amp response): Gdb = 10*[1:NG/2,NG/2:-1:1]/(NG/2); % between 0 and 10 dB gain % Must decide on a dc value. % Either use what is known to be true or pick something "maximally % smooth". Here we do a simple linear extrapolation: dc_amp = Gdb(1) - f(1)*(Gdb(2)-Gdb(1))/(f(2)-f(1)); % Must also decide on a value at half the sampling rate. % Use either a realistic estimate or something "maximally smooth". % Here we do a simple linear extrapolation. While zeroing it % is appealing, we do not want any zeros on the unit circle here. Gdb_last_slope = (Gdb(NG) - Gdb(NG-1)) / (f(NG) - f(NG-1)); nyq_amp = Gdb(NG) + Gdb_last_slope * (fs/2 - f(NG)); Gdbe = [dc_amp, Gdb, nyq_amp]; fe = [0,f,fs/2]; NGe = NG+2; % Resample to a uniform frequency grid, as required by ifft. % We do this by fitting cubic splines evaluated on the fft grid: Gdbei = spline(fe,Gdbe); % say `help spline' fk = fs*[0:Nfft/2]/Nfft; % fft frequency grid (nonneg freqs) Gdbfk = ppval(Gdbei,fk); % Uniformly resampled amp-resp figure(1); semilogx(fk(2:end-1),Gdbfk(2:end-1),'-k'); grid('on'); axis([fmin/2 fmax*2 -3 11]); hold('on'); semilogx(f,Gdb,'ok'); xlabel('Frequency (Hz)'); ylabel('Magnitude (dB)'); title(['Measured and Extrapolated/Interpolated/Resampled ',... 'Amplitude Response']);
It is good to check that the desired impulse response is not overly aliased in the time domain. The impulse-response for this example is plotted in Fig.8.5. We see that it appears quite short compared with the inverse FFT used to compute it. The script in Fig.8.6 gives the details of this computation, and also prints out a measure of ``time-limitedness'' defined as the norm of the outermost 20% of the impulse response divided by its total norm--this measure was reported to be % for this example.
Ns = length(Gdbfk); if Ns~=Nfft/2+1, error("confusion"); end Sdb = [Gdbfk,Gdbfk(Ns-1:-1:2)]; % install negative-frequencies S = 10 .^ (Sdb/20); % convert to linear magnitude s = ifft(S); % desired impulse response s = real(s); % any imaginary part is quantization noise tlerr = 100*norm(s(round(0.9*Ns:1.1*Ns)))/norm(s); disp(sprintf(['Time-limitedness check: Outer 20%% of impulse ' ... 'response is %0.2f %% of total rms'],tlerr)); % = 0.02 percent if tlerr>1.0 % arbitrarily set 1% as the upper limit allowed error('Increase Nfft and/or smooth Sdb'); end figure(2); plot(s,'-k'); grid('on'); title('Impulse Response'); xlabel('Time (samples)'); ylabel('Amplitude');
Converting the Desired Amplitude Response to Minimum Phase
Phase-sensitive filter-design methods such as the equation-error method implemented in invfreqz are normally constrained to produce filters with causal impulse responses.9.15 In cases such as this (phase-sensitive filter design when we don't care about phase--or don't have it), it is best to compute the minimum phase corresponding to the desired amplitude response .
c = ifft(Sdb); % compute real cepstrum from log magnitude spectrum % Check aliasing of cepstrum (in theory there is always some): caliaserr = 100*norm(c(round(Ns*0.9:Ns*1.1)))/norm(c); disp(sprintf(['Cepstral time-aliasing check: Outer 20%% of ' ... 'cepstrum holds %0.2f %% of total rms'],caliaserr)); % = 0.09 percent if caliaserr>1.0 % arbitrary limit error('Increase Nfft and/or smooth Sdb to shorten cepstrum'); end % Fold cepstrum to reflect non-min-phase zeros inside unit circle: % If complex: % cf=[c(1),c(2:Ns-1)+conj(c(Nfft:-1:Ns+1)),c(Ns),zeros(1,Nfft-Ns)]; cf = [c(1), c(2:Ns-1)+c(Nfft:-1:Ns+1), c(Ns), zeros(1,Nfft-Ns)]; Cf = fft(cf); % = dB_magnitude + j * minimum_phase Smp = 10 .^ (Cf/20); % minimum-phase spectrum Smpp = Smp(1:Ns); % nonnegative-frequency portion wt = 1 ./ (fk+1); % typical weight fn for audio wk = 2*pi*fk/fs; [B,A] = invfreqz(Smpp,wk,NZ,NP,wt); Hh = freqz(B,A,Ns); figure(3); plot(fk,db([Smpp(:),Hh(:)])); grid('on'); xlabel('Frequency (Hz)'); ylabel('Magnitude (dB)'); title('Magnitude Frequency Response'); % legend('Desired','Filter');
Further Reading on Digital Filter Design
This section provided only a ``surface scratch'' into the large topic of digital filter design based on an arbitrary frequency response. The main goal here was to provide a high-level orientation and to underscore the high value of such an approach for encapsulating linear, time-invariant subsystems in a computationally efficient yet accurate form. Applied examples will appear in later chapters. We close this section with some pointers for further reading in the area of digital filter design.
Some good books on digital filter design in general include [343,362,289]. Also take a look at the various references in the help/type info for Matlab/Octave functions pertaining to filter design. Methods for FIR filter design (used in conjunction with FFT convolution) are discussed in Book IV , and the equation-error method for IIR filter design was introduced in Book II . See [281,282] for related techniques applied to guitar modeling. See  for examples of using matlab functions invfreqz and invfreqs to fit filters to measured frequency-response data (specifically the wah pedal design example). Other filter-design tools can be found in the same website area.
The overview of methods in §8.6.2 above is elaborated in , including further method details, application to violin modeling, and literature pointers regarding the methods addressed. Some of this material was included in [449, Appendix I].
In Octave or Matlab, say lookfor filter to obtain a list of filter-related functions. Matlab has a dedicated filter-design toolbox (say doc filterdesign in Matlab). In many matlab functions (both Octave and Matlab), there are literature citations in the source code. For example, type invfreqz in Octave provides a URL to a Web page (from ) describing the FFT method for equation-error filter design.
In acoustic stringed musical instruments such as guitars and pianos, the strings couple via a ``bridge'' to some resonating acoustic structure (typically made of wood) that is required for efficient transduction of string vibration to acoustic propagation in the surrounding air. The resonator also imposes its own characteristic frequency response on the radiated sound. Spectral characteristics of the string excitation, the string resonances, and body/soundboard/enclosure resonator, are thus combined multiplicatively in the radiated sound, as depicted in Fig. 8.9.
The excitation can now be convolved with the resonator impulse response to provide a single, aggregate, excitation table, as depicted in Fig. 8.11. This is the basic idea behind commuted synthesis, and it greatly reduces the complexity of stringed instrument implementations, since the body filter is replaced by an inexpensive lookup table [439,230]. These simplifications are presently important in single-processor polyphonic synthesizers such as multimedia DSP chips.
In the simplest case, the string is ``plucked'' using the (half-windowed) impulse response of the body.
An example of an excitation is the force applied by a pick or a finger at some point, or set of points, along the string. The input force per sample at each point divided by gives the velocity to inject additively at that point in both traveling-wave directions. (The factor of comes from splitting the injected velocity into two traveling-wave components, and from the fact that two string end-points are being driven.) Equal injection in the left- and right-going directions corresponds to an excitation force which is stationary with respect to the string.
In a practical instrument, the ``resonator'' is determined by the choice of output signal in the physical scenario, and it generally includes filtering downstream of the body itself, as shown in Fig. 8.12. A typical example for the guitar or violin would be to choose the output signal at a point a few feet away from the top plate of the body. In practice, such a signal can be measured using a microphone held at the desired output point and recording the response at that point to the striking of the bridge with a force hammer. It is useful to record simultaneously the output of an accelerometer mounted on the bridge in order to also obtain experimentally the driving-point impedance at the bridge. In general, it is desirable to choose the output close to the instrument so as to keep the resonator response as short as possible. The resonator components need to be linear and time invariant, so they will be commutative with the string and combinable with the string excitation signal via convolution.
The string should also be linear and time invariant in order to be able to commute it with the generalized resonator. However, the string is actually the least linear element of most stringed musical instruments, with the main effect of nonlinearity being often a slight increase of the fundamental vibration frequency with amplitude. A secondary effect is to introduce coupling between the two polarizations of vibration along the length of the string. In practice, however, the string can be considered sufficiently close to linear to permit commuting with the body. The string is also time varying in the presence of vibrato, but this too can be neglected in practice. While commuting a live string and resonator may not be identical mathematically, the sound is substantially the same.
There are various options when combining the excitation and resonator into an aggregate excitation, as shown in Fig. 8.11. For example, a wave-table can be prepared which contains the convolution of a particular point excitation with a particular choice of resonator. Perhaps the simplest choice of excitation is the impulse signal. Physically, this would be natural when the wave variables in the string are taken to be acceleration waves for a plucked string; in this case, an ideal pluck gives rise to an impulse of acceleration input to the left and right in the string at the pluck point. If loss of perceived pick position is unimportant, the impulse injection need only be in a single direction. (The comb filtering which gives rise to the pick-position illusion can be restored by injecting a second, negated impulse at a delay equal to the travel time to and from the bridge.) In this simple case of a single impulse to pluck the string, the aggregate excitation is simply the impulse response of the resonator. Many excitation and resonator variations can be simulated using a collection of aggregate excitation tables. It is useful to provide for interpolation of excitation tables so as to provide intermediate points along a parameter dimension. In fact, all the issues normally associated with sampling synthesis arise in the context of the string excitation table. A disadvantage of combining excitation and resonator is the loss of multiple output signals from the body simulation, but the timbral effects arising from the mixing together of multiple body outputs can be obtained via a mixing of corresponding excitation tables.
If the aggregate excitation is too long, it may be shortened by a variety of techniques. It is good to first convert the signal to minimum phase so as to provide the maximum shortening consistent with the original magnitude spectrum. Secondly, can be windowed using the right wing of any window function typically used in spectrum analysis. An interesting choice is the exponential window, since it has the interpretation of increasing the resonator damping in a uniform manner, i.e., all the poles and zeros of the resonator are contracted radially in the plane by the same factor.
In commuted synthesis, it is often helpful to factor out the least-damped resonances in the body model for implementation in parametric form (e.g., as second-order filters, or ``biquads''). The advantages of this are manifold, including the following:
- The excitation table is shortened (now containing only the most damped modal components).
- The excitation table signal-to-quantization-noise ratio is improved.
- The most important resonances remain parametric, facilitating real-time control. In other words, the parametric resonances can be independently modulated to produce interesting effects.
- Multiple body outputs become available from the parametric filters.
- Resonators may be already available in a separate effects unit, making
- A memory vs. computation trade-off is available for cost optimization.
Laurson et al. [278,277] have developed very high quality synthesis of classical guitar from an extended musical score using the commuted synthesis approach. A key factor in the quality of these results is the great attention to detail in the area of musical control when driving the model from a written score. In contrast to this approach, most of the sound examples in  were ``played'' in real time, using a MIDI guitar controller feeding the serial port of a NeXT Computer running SynthBuilder .9.17
Extracting Parametric Resonators from a Nonparametric Impulse Response
As mentioned in §8.7.1 above, a valuable way of shortening the excitation table in commuted synthesis is to factor the body resonator into its most-damped and least-damped modes. The most-damped modes are then commuted and combined with the excitation in impulse-response form. The least-damped modes can be left in parametric form as recursive digital filter sections.
Commuted synthesis is a technique in which the body resonator is commuted with the string model, as shown in Fig.8.10, in order to avoid having to implement a large body filter at all [439,232,229].9.18In commuted synthesis, the excitation (e.g., plucking force versus time) can be convolved with the resonator impulse response to provide a single aggregate excitation signal. This signal is short enough to store in a look-up table, and a note is played by simply summing it into the string.
The goal of resonator factoring is to identify and remove the least-damped resonant modes of the impulse response. In principle, this means ascertaining the precise resonance frequencies and bandwidths associated with each of the narrowest ``peaks'' in the resonator frequency response, and dividing them out via inverse filtering, so they can be implemented separately as resonators in series. If in addition the amplitude and phase of a resonance peak are accurately measurable in the complex frequency response, the mode can be removed by complex spectral subtraction (equivalent to subtracting the impulse-response of the resonant mode from the total impulse response); in this case, the parametric modes are implemented in a parallel bank as in . However, in the parallel case, the residual impulse response is not readily commuted with the string.
In the inverse-filtering case, the factored resonator components are in cascade (series) so that the damped modes left behind may be commuted with the string and incorporated in the excitation table by convolving the residual impulse response with the desired string excitation signal. In the parallel case, the damped modes do not commute with the string since doing so would require somehow canceling them in the parallel filter sections. In principle, the string would have to be duplicated so that one instance can be driven by the residual signal with no body resonances at all, while the other is connected to the parallel resonator bank and driven only by the natural string excitation without any commuting of string and resonator. Since duplicating the string is unlikely to be cost-effective, the impulse response of the high-frequency modes can be commuted and convolved with the string excitation as in the series case to obtain qualitative results. The error in doing this is that the high-frequency modes are being multiplied by the parallel resonators rather than being added to them.
Various methods are available for estimating the mode parameters for inverse filtering:
- Amplitude response peak measurement
- Weighted digital filter design
- Linear prediction
- Sinusoidal modeling
- Late impulse-response analysis
The longest ringing modes are associated with the narrowest bandwidths. When they are important resonances in the frequency response, they also tend to be the tallest peaks in the frequency response magnitude. (If they are not the tallest near the beginning of the impulse response, they will be the tallest near the end.) Therefore, one effective technique for measuring the least-damped resonances is simply to find the precise location and width of the narrowest and tallest spectral peaks in the measured amplitude response of the resonator. The center-frequency and bandwidth of a narrow frequency-response peak determine two poles in the resonator to be factored out. Expressing a filter in terms of its poles and zeros is one type of ``parametric'' filter representation, as opposed to ``nonparametric'' representations such as the impulse response or frequency response. Prony's method [449,297,273] is one well known technique for estimating the frequencies and bandwidths of sums of exponentially decaying sinusoids (two-pole resonator impulse responses).
In the factoring example presented in §8.8.6, the frequency and bandwidth of the main Helmholtz air mode are measured manually using an interactive spectrum analysis tool. However, it is a simple matter to automate peak-finding in FFT magnitude data. (See, for example, the peak finders used in sinusoidal modeling, discussed a bit further in §8.8.1 below.)
Many methods for digital filter design support spectral weighting functions that can be used to focus in on the least-damped modes in the frequency response. One is the weighted equation-error method which is available in the matlab invfreqz() function (§8.6.4). Figure 8.13 illustrates use of it. For simplicity, only one frequency-response peak plus noise is shown in this synthetic example. First, the peak center-frequency is measured using a quadratically interpolating peak finder operating on the dB spectral magnitude. This is used to set the spectral weighting function. Next, invfreqz() is called to design a two-pole filter having a frequency response that approximates the measured data as closely as possible. The weighting function is also shown in Fig.8.13, renormalized to overlay on the scale of the plot. Finally, the amplitude response of the two-pole filter designed by the equation-error method is shown overlaid in the figure. Note that the agreement is quite good near the peak which is what matters most. The interpolated peak frequency measured initially in the nonparametric spectral magnitude data can be used to fine-tune the pole-angles of the designed filter, thus rendering the equation-error method a technique for measuring only the peak bandwidth in this case. There are of course many, many techniques in the signal processing literature for measuring spectral peaks.
Another well known method for converting the least-damped modes into parametric form is Linear Predictive Coding (LP) followed by polynomial factorization to obtain resonator poles. LP is particularly good at fitting spectral peaks due to the nature of the error criterion it minimizes . The poles closest to the unit circle in the plane can be chosen for the ``ringy'' part of the resonator. It is well known that when using techniques like LP to model spectral peaks for extraction, orders substantially higher than twice the number of spectral peaks should be used. The extra degrees of freedom in the LP fit give more poles for modeling spectral detail other than the peaks, allowing the poles modeling the peaks to fit them with less distraction. On the other hand, if the order chosen is too high, sometimes more than two poles will home in on the same peak. In some cases, this may be appropriate since the body resonances are not necessary resolvable so as to separate the peaks, especially at high frequencies.
Another way to find the least-damped mode parameters is by means of an intermediate sinusoidal model of the body impulse response, or, more appropriately, the energy decay relief (EDR) computed from the body impulse response (see §3.2.2). Such sinusoidal models have been used to determine the string loop filter in digital waveguide strings models. In the case of string loop-filter estimation, the sinusoidal model is applied to the impulse response (or ``pluck'' response) of a vibrating string or acoustic tube. In the present application, it is ideally applied to the EDR of the body impulse response (or ``hammer-strike'' response).
Since sinusoidal modeling software (e.g. ) typically quadratically interpolates the peak frequencies, the resonance frequencies are generally quite accurately estimated provided the frame size is chosen large enough to span many cycles of the underlying resonance.
The sinusoidal amplitude envelopes yield a particularly robust measurement of resonance bandwidth. Theoretically, the modal decay should be exponential. Therefore, on a dB scale, the amplitude envelope should decay linearly. Linear regression can be used to fit a straight line to the measured log-amplitude envelope of the impulse response of each long-ringing mode. Note that even when amplitude modulation is present due to modal couplings, the ripples tend to average out in the regression and have little effect on the slope measurement. This robustness can be enhanced by starting and ending the linear regression on local maxima in the amplitude envelope. A method for estimating modal decay parameters in the presence of noise is given in [125,234,235].
Below is a section of matlab code which performs linear regression:
function [slope,offset] = fitline(x,y); %FITLINE fit line 'y = slope * x + offset' % to column vectors x and y. phi = [x, ones(length(x),1)]; p = phi' * y; r = phi' * phi; t = r\p; slope = t(1); offset = t(2);
All methods useable with inverse filtering can be modified based on the observation that late in the impulse response, the damped modes have died away, and the least-damped modes dominate. Therefore, by discarding initial impulse-response data, the problem in some sense becomes ``easier'' at the price of working closer to the noise floor. This technique is most appropriate in conjunction with the inverse filtering method for mode extraction (discussed below), since for subtraction, the modal impulse response must be extrapolated back to the beginning of the data record. However, methods used to compute the filter numerator in variations on Prony's method can be used to scale and phase-align a mode for subtraction [428,297].
One simple approximate technique based on looking only at the late impulse response is to take a zero-padded FFT of the latest Hanning-windowed data segment. The least-damped modes should give very clearly dominant peaks in the FFT magnitude data. As discussed above, the peak(s) can be interpolated to estimate the mode resonance frequency, and the bandwidth can be measured to determine the time-constant of decay. Alternatively, the time-constant of decay can be measured in the time domain by measuring the decay slope of the log amplitude envelope across the segment (this time using a rectangular window). Since the least-damped mode is assumed to be isolated in the late decay, it is easy to form a pitch-synchronous amplitude envelope.
Whatever poles are chosen for the least-damped part, and however they are computed (provided they are stable), the damped part can be computed from the full impulse response and parametric part using inverse filtering, as illustrated in the computed examples above. The inverse filter is formed from zeros equal to the estimated resonant poles. When the inverse filter is applied to the full resonator impulse, a ``residual'' signal is formed which is defined as the impulse response of the leftover, more damped modes. The residual is in exactly the nonparametric form needed for commuting with the string and convolving with the string excitation signal, such as a ``pluck'' signal. Feeding the residual signal to the parametric resonator gives the original resonator impulse response to an extremely high degree of accuracy. The error is due only to numerical round-off error during the inverse and forward filtering computations. In particular, the least-damped resonances need not be accurately estimated for this to hold. When there is parametric estimation error, the least-damped components will fail to be completely removed from the residual signal; however, the residual signal through the parametric resonator will always give an exact reconstruction of the original body impulse response, to within roundoff error. This is similar to the well known feature of linear predictive coding that feeding the prediction error signal to the LP model always gives back the original signal .
The parametric resonator need not be restricted to all-pole filters, however, although all-pole filters (plus perhaps zeros set manually to the same angles but contracted radii) turn out to be very convenient and simple to work with. Many filter design techniques exist which can produce a parametric part having any prescribed number of poles and zeros, and weighting functions can be used to ``steer'' the methods toward the least-damped components of the impulse response. The equation-error method illustrated in Fig. 8.13 is an example of a method which can also compute zeros in the parametric part as well as poles. However, for inverse filtering to be an option, the zeros must be constrained to be minimum phase so that their inverses will be stable poles.
In experiments factoring guitar body impulse responses, it was found that the largest benefit per section comes from pulling out the main Helmholtz air resonance. Doing just this shortens the impulse response (excitation table) by a very large factor, and because the remaining impulse response is noise-like, it can be truncated more aggressively without introducing artifacts.
It also appears that the bandwidth estimate is not very critical in this case. If it is too large, or if ``isolation zeros'' are not installed behind the poles, as shown in Figs. 8.18b and 8.21b, the inverse filtering serves partially as a preemphasis which tends to flatten the guitar body frequency response overall or cause it to rise with frequency. This has a good effect on the signal-to-quantization-noise ratio versus frequency. To maximize the worst-case signal-to-quantization-noise versus frequency, the residual spectrum should be flat since the quantization noise spectrum is normally close to flat. A preemphasis filter for flattening the overall spectrum is commonly used in speech analysis [363,297]. A better preemphasis in this context is an inverse equal-loudness preemphasis, taking the inverse of an equal-loudness contour near the threshold of hearing in the Fletcher-Munson curves . This corresponds to psychoacoustic ``noise shaping'' so that the quantization noise floor is perceptually uniform, and decreasing playback volume until it falls below the threshold of hearing results in all of the noise disappearing across the entire spectrum at the same volume.9.19
Since in some fixed-point implementations, narrow bandwidths may be difficult to achieve, good results are obtained by simply setting the bandwidth of the single resonator to any minimum robust value. As a result, there may still be some main-air-mode response in the residual signal, but it is typically very small, and early termination of it using a half-window for table shortening is much less audible than if the original impulse response were similarly half-windowed. The net effect on the instrument is to introduce artificial damping the main air mode in the guitar body. However, since this mode rings so much longer than the rest of the modes in the guitar body, shortening it does not appear to be detrimental to the overall quality of the instrument. In general, it is not desirable for isolated modes to ring longer than all others. Why would a classical guitarist want an audible ``ringing'' of the guitar body near Hz?
Matlab Code for Inverse Filtering
freq = 104.98; % estimated peak frequency in Hz bw = 10; % peak bandwidth estimate in Hz R = exp( - pi * bw / fs); % pole radius z = R * exp(j * 2 * pi * freq / fs); % pole itself B = [1, -(z + conj(z)), z * conj(z)] % numerator r = 0.9; % zero/pole factor (notch isolation) A = B .* (r .^ [0 : length(B)-1]); % denominator residual = filter(B,A,bodyIR); % apply inverse filter
When the amplitude envelope, frequency, phase, and onset time are all accurately estimated (for all time), it is possible to subtract the synthesized modal impulse response from the measured impulse response. (This contrasts with purely spectral modal parameters which are amplitude, frequency, bandwidth, and phase.) This method of ``sinusoidal track removal'' is used in sines-plus-noise spectral modeling. (See  for further details and supporting C software). In this approach, the resonant mode is subtracted out rather than divided out of the frequency response.
There are some disadvantages of subtraction relative to inverse filtering. First, more parameters must be accurately measured; the precise gain and phase of the resonance are needed in addition to its frequency and bandwidth. Inverse filtering on the other hand requires only estimation of frequency and bandwidth (or frequency and time-constant of decay). In addition, the residual impulse response after subtraction cannot be precisely commuted with the string for commuted synthesis.
The advantages of subtraction over inverse filtering are that amplitude modulation due to mode coupling can be retained in the measured modal decay and subtracted out, whereas a second-order inverse filter cannot subtract out the modulation due to coupling. Also, if the system is time varying (as happens, for example, when the performer's hand is pressing against the resonating body in a time-varying way), the subtraction method can potentially track the changing modal parameters and still successfully remove the modal decay. To compete with this, the inverse filtering method would have to support a time-varying filter model. As another example, if the guitar body is moving, the measured response is a time-varying linear combination of fixed resonant modes (although some Doppler shift is possible). The subtraction method can follow a time-varying gain (and phase) so that the subtraction still will take out the mode.
The inverse filtering method seems most natural when the physical resonator is time-invariant and is well modeled as a series of resonant sections. It is also the only one strictly valid for use in commuted synthesis. The subtraction method seems most natural when the physical resonator is best modeled as a sum of resonating modes. As a compromise between the two approaches, all parametric modes may be separated from the nonparametric modes by means of inverse filtering, and the parametric part can then be split into parallel form.
In , a method is described for producing a parallel bank of fourth-order IIR body filters from an arbitrary starting impulse response. Initially, the body impulse is approximated by the impulse response of a fourth-order IIR filter using an algorithm of second author Cheng; the method is said to pick out the most resonant modes of the sampled impulse response, and fourth-order sections were found experimentally to best characterize the resonant modes of the guitar body. The impulse response of the filter so designed is subtracted from the measured impulse response to form a residual impulse response which is given again to the IIR design algorithm. This process is repeated to obtain a parallel bank of fourth-order filters, plus a final residual which is discarded. In the present context, the final residual should be incorporated at least qualitatively into the string excitation signal, or else the method should be applied only to the total parametric impulse response computed by other means and separated from the nonparametric impulse response by inverse filtering.
Approximating Shortened Excitations as Noise
Figure 8.14b suggests that the many damped modes remaining in the shortened body impulse response may not be clearly audible as resonances since their decay time is so short. This is confirmed by listening to shortened and spectrally flattened body responses which sound somewhere between a click and a noise burst. These observations suggest that the shortened, flattened, body response can be replaced by a psychoacoustically equivalent noise burst, as suggested for modeling piano soundboards . Thus, the signal of Fig. 8.14b can be synthesized qualitatively by a white noise generator multiplied by an amplitude envelope. In this technique, it is helpful to use LP on the residual signal to flatten it. As a refinement, the noise burst can be time-varying filtered so that high frequencies decay faster . Thus, the stringed instrument model may consist of noise generator excitation amplitude-shaping time-varying lowpass string model parametric resonators multiple outputs. In addition, properties of the physical excitation may be incorporated, such as comb filtering to obtain a virtual pick or hammer position. Multiple outputs provide for individual handling of the most important resonant modes and facilitate simulation of directional radiation characteristics .
It is interesting to note that by using what is ultimately a noise-burst excitation, we have almost come full circle back to the original extended Karplus-Strong algorithm [236,207]. Differences include (1) the amplitude shaping of the excitation noise to follow the envelope of the impulse-response of the highly damped modes of the guitar body (convolved with the physical string excitation), (2) more sophisticated noise filtering which effectively shapes the noise in the frequency domain to follow the frequency response of the highly damped body modes (times the excitation spectrum), and (3) the use of factored-out body resonances which give real resonances such as the main Helmholtz air mode. The present analysis also provides a theoretical explanation as to why use of a noise burst in the Karplus-Strong algorithm is generally considered preferable to a theoretically motivated initial string shape such as the asymmetric triangular wave which leans to the left according to pick position [437, p. 82].
From a psychoacoustical perspective, it may be argued that the excitation noise burst described above is not perceptually uniform. The amplitude envelope for the noise burst provides noise-shaping in the time domain, and the LP filter provides noise-shaping in the frequency domain, but only at uniform resolution in time and frequency. Due to properties of hearing, it can be argued that multi-resolution noise shaping should be used. Thus, the LP fit for obtaining the noise-shaping filter should be carried out over a Bark frequency axis as in Fig. 8.19b. Since LP operates on the autocorrelation function, a warped autocorrelation can be computed simply as the inverse FFT of the warped squared-magnitude spectrum.
Body Factoring Example
Figure 8.14a shows the impulse response of a classical guitar body sampled at kHz. It was determined empirically that at least the first msec of this impulse response needs to be stored in the excitation table to produce a high quality synthetic guitar. Figure 8.14b shows the same impulse response after factoring out a single resonating mode near Hz (the main Helmholtz air mode). A close-up of the initial response is shown in Fig. 8.15. As can be seen, the residual response is considerably shorter than the original.
Figure 8.16a shows the guitar-body amplitude response, and Fig. 8.16b shows the response after the main Helmholtz air mode is removed by inverse filtering with a two-pole, two-zero filter. Figure 8.18 shows the same thing but with only a two-zero inverse filter; in this case the overall spectral shape is more affected.
Figure 8.19a shows the measured guitar-body amplitude response after warping to an approximate Bark frequency axis. Figure 8.19b shows the Bark-warped amplitude response after the main Helmholtz air mode is removed by inverse filtering. Fig. 8.20 shows the low-frequency close-up. The warped amplitude response was computed as the FFT of the impulse response of the FIR filter given by the original impulse response with each unit delay being replaced by a first-order allpass filter, as originally suggested in  and described further in .
Fig. 8.21 shows the corresponding results if a two-zero inverse filter is used rather than a two-pole, two-zero inverse filter, i.e., without the ``isolation poles,'' ( in Eq.(8.18) below). In this case, there is an overall ``EQ'' boosting high frequencies and attenuating low frequencies. However, comparing Figs. 8.18b and 8.21b, we see that the global EQ effect is less pronounced in the Bark-warped case. On the Bark frequency scale, it is much easier numerically to eliminate the main air mode.
The modal bandwidth used in the inverse filtering was chosen to be Hz which corresponds to a of for the main air mode. If the Bark-warping is done using a first-order conformal map , its inverse preserves filter order [428, pp. 61-67]. Applying the inverse warping to the parametric resonator drives its pole radius from in the Bark-warped plane to in the unwarped plane.
Note that if the inverse filter consists only of two zeros determined by the spectral peak parameters, other portions of the spectrum will be modified by the inverse filtering, especially at the next higher resonance, and in the linear trend of the overall spectral shape. To obtain a more localized mode extraction (useful when the procedure is to be repeated), we define the inverse filter as
where is the inverse filter determined by the peak frequency and bandwidth, and is the same polynomial with its roots contracted by the factor . If is close to but less than , the poles and zeros substantially cancel far away from the removed modal frequency so that the inverse filter has only a local effect on the frequency response. In the computed examples, was arbitrarily set to , but it is not critical.
Because the main air mode is extremely narrow, the probability of overflow can be reduced in fixed-point implementations by artificially dampening it. Reducing the of the main Helmholtz air mode from to corresponds to a decay time of about sec. This is consistent with the original desire to retain the first msec of the body impulse response.
For completeness, the Bark-warped impulse-responses are also shown in Figs. 8.22. Figure 8.22a shows the complete Bark-warped impulse response obtained by taking the inverse FFT of Fig. 8.19a, and Fig. 8.22b shows the shortened Bark-warped impulse response defined as the inverse FFT of Fig. 8.19b. We see that given a Bark-warped frequency axis (which more accurately represents what we hear), the time duration of the high-frequency modes is extended while the low-frequency modes are contracted in time duration. Thus, the modal decay times show less of a spread versus frequency. This also accounts for the reduced apparent shortening by the inverse filtering in the Bark-warped case.
Virtual Analog Example: Phasing
As mentioned in §5.4, the phaser, or phase shifter works by sweeping notches through a signal's short-time spectrum. The notches are classically spaced nonuniformly, while flangers employ uniformly spaced notches (§5.3). In this section, we look at using a chain of allpass filters to implement the phasing effect.
Phasing with First-Order Allpass Filters
The block diagram of a typical inexpensive phase shifter for guitar players is shown in Fig.126.96.36.199 It consists of a series chain of first-order allpass filters,9.21 each having a single time-varying parameter controlling the pole and zero location over time, plus a feedforward path through gain which is a fixed depth control. Thus, the delay line of the flanger is replaced by a string of allpass filters. (A delay line is of course an allpass filter itself.)
(In classic phaser circuits such as the Univibe, is used, but since there is an even number (four) of allpass stages, there is no difference.) In discrete time, the general first-order allpass has the transfer function
Figure 8.24a shows the phase responses of four first-order analog allpass filters with set to . Figure 8.24b shows the resulting normalized amplitude response for the phaser, for (unity feedfoward gain). The amplitude response has also been normalized by dividing by 2 so that the maximum gain is 1. Since there is an even number (four) of allpass sections, the gain at dc is . Put another way, the initial phase of each allpass section at dc is , so that the total allpass-chain phase at dc is . As frequency increases, the phase of the allpass chain decreases. When it comes down to , the net effect is a sign inversion by the allpass chain, and the phaser has a notch. There will be another notch when the phase falls down to . Thus, four first-order allpass sections give two notches. For each notch in the desired response we must add two new first-order allpass sections.
From Fig.8.24b, we observe that the first notch is near Hz. This happens to be the frequency at which the first allpass pole ``breaks,'' i.e., . Since the phase of a first-order allpass section at its break frequency is , the sum of the other three sections must be approximately . Equivalently, since the first section has ``given up'' radians of phase at , the other three allpass sections combined have given up radians as well (with the second section having given up more than the other two).
In practical operation, the break frequencies must change dynamically, usually periodically at some rate.
Classic Virtual Analog Phase Shifters
To create a virtual analog phaser, following closely the design of typical analog phasers, we must translate each first-order allpass to the digital domain. Working with the transfer function, we must map from plane to the plane. There are several ways to accomplish this goal . However, in this case, an excellent choice is the bilinear transform (see §7.3.2), defined by
where is chosen to map one particular frequency to exactly where it belongs. In this case, can be chosen for each section to map the break frequency of the section to exactly where it belongs on the digital frequency axis. The relation between analog frequency and digital frequency follows immediately from Eq.(8.20) as
Thus, given a particular desired break-frequency , we can set
Figure 8.25 shows the digital phaser response curves corresponding to the analog response curves in Fig.8.24. While the break frequencies are preserved by construction, the notches have moved slightly, although this is not visible from the plots. An overlay of the total phase of the analog and digital allpass chains is shown in Fig.8.26. We see that the phase responses of the analog and digital alpass chains diverge visibly only above 9 kHz. The analog phase response approaches zero in the limit as , while the digital phase response reaches zero at half the sampling rate, kHz in this case. This is a good example of when the bilinear transform performs very well.
In general, the bilinear transform works well to digitize feedforward analog structures in which the high-frequency warping is acceptable. When frequency warping is excessive, it can be alleviated by the use of oversampling; for example, the slight visible deviation in Fig.8.26 below 10 kHz can be largely eliminated by increasing the sampling rate by 15% or so. See the case of digitizing the Moog VCF for an example in which the presence of feedback in the analog circuit leads to a delay-free loop in the digitized system [479,477].
Phasing with 2nd-Order Allpass Filters
The allpass structure proposed in  provides a convenient means for generating nonuniformly spaced notches that are independently controllable to a high degree. An advantage of the allpass approach even in the case of uniformly spaced notches (which we call flanging, as introduced in §5.3) is that no interpolating delay line is needed.
The architecture of the phaser based on second-order allpasses is shown in Fig.8.27. It is identical to that in Fig.8.23 with each first-order allpass being replaced by a second-order allpass. I.e., replace in Fig.8.23 by , for , to get Fig.8.27. The phaser will have a notch wherever the phase of the allpass chain is at (180 degrees). It can be shown that these frequencies occur very close to the resonant frequencies of the allpass chain . It is therefore convenient to use a single conjugate pole pair in each allpass section, i.e., use second-order allpass sections of the form
and is the radius of each pole in the complex-conjugate pole pair, and pole angles are . The pole angle can be interpreted as where is the resonant frequency and is the sampling interval.
To move just one notch, the tuning of the pole-pair in the corresponding section is all that needs to be changed. Note that tuning affects only one coefficient in the second-order allpass structure. (Although the coefficient appears twice in the transfer function, it only needs to be used once per sample in a slightly modified direct-form implementation .)
The depth of the notches can be varied together by changing the gain of the feedforward path.
The bandwidth of individual notches is mostly controlled by the distance of the associated pole-pair from the unit circle. So to widen the notch associated with a particular allpass section, one may increase the ``damping'' of that section.
Finally, since the gain of the allpass string is unity (by definition of allpass filters), the gain of the entire structure is strictly bounded between 0 and 2. This property allows arbitrary notch controls to be applied without fear of the overall gain becoming ill-behaved.
As mentioned above, it is desirable to avoid exact harmonic spacing of the notches, but what is the ideal non-uniform spacing? One possibility is to space the notches according to the critical bands of hearing, since essentially this gives a uniform notch density with respect to ``place'' along the basilar membrane in the ear. There is no need to follow closely the critical-band structure, so that simple exponential spacing may be considered sufficiently perceptually uniform (corresponding to uniform spacing on a log frequency scale). Due to the immediacy of the relation between notch characteristics and the filter coefficients, the notches can easily be placed under musically meaningful control.
Virtual Musical Instruments