With the increasing use of frequency-domain techniques in audio signal processing applications such as audio compression, there is increasing emphasis on psychoacoustic-based spectral measures [274,17,113,118]. In particular, frequency warping is an important tool in spectral audio signal processing. For example, audio spectrograms (Chapter 7) can display signal energy versus time over a more perceptual, nonuniform, audio frequency axis (§7.3). Also, methods for digital filter design (Chapter 4) having no weighting function versus frequency, such as linear predictive coding (LPC) (§10.3), can be given an effective weighting function by means of frequency warping .
A common choice of audio frequency warping in audio applications is from a linear frequency scale to a Bark frequency scale (also called ``critical band rate'') [306,307,304,179,102,269]. The Bark scale is defined so that critical bands of hearing are uniformly spaced. (One critical bandwidth equals one Bark.)
A more recently developed psychoacoustic frequency scale, called the Equivalent Rectangular Bandwidth (ERB) scale , is based on different psychoacoustic experiments resulting in generally narrower critical bandwidth estimates.
This appendix, condensed from [269,268], describes a useful class approximate Bark/ERB frequency warpings that may be implemented using a bilinear transform (first-order conformal map of the unit circle to itself in the plane). Such warpings preserve order in filter-design applications. That is, the warping can be undone by the inverse bilinear transform which, because its first order, does not change the order of the filter that was designed over the warped frequency axis.
Based on the results of many psychoacoustic experiments, the Bark scale is defined so that the critical bands of human hearing each have a width of one Bark. By representing spectral energy (in dB) over the Bark scale, a closer correspondence is obtained with spectral information processing in the ear (§7.3).
The Bark scale ranges from 1 to 24 Barks, corresponding to the first 24 critical bands of hearing . The published Bark band edges are given in Hertz as [0, 100, 200, 300, 400, 510, 630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700, 3150, 3700, 4400, 5300, 6400, 7700, 9500, 12000, 15500]. The published band centers in Hertz are [50, 150, 250, 350, 450, 570, 700, 840, 1000, 1170, 1370, 1600, 1850, 2150, 2500, 2900, 3400, 4000, 4800, 5800, 7000, 8500, 10500, 13500]. These center-frequencies and bandwidths are to be interpreted as samplings of a continuous variation in the frequency response of the ear to a sinusoid or narrow-band noise process. That is, critical-band-shaped masking patterns should be seen as forming around specific stimuli in the ear rather than being associated with a specific fixed filter bank in the ear.
Note that since the Bark scale is defined only up to 15.5 kHz, the highest sampling rate for which the Bark scale is defined up to the Nyquist limit, without requiring extrapolation, is 31 kHz. The 25th Bark band certainly extends above 19 kHz (the sum of the 24th Bark band edge and the 23rd critical bandwidth), so that a sampling rate of 40 kHz is implicitly supported by the data. We have extrapolated the Bark band-edges in our work, appending the values [20500, 27000] so that sampling rates up to 54 kHz are defined. While human hearing generally does not extend above 20 kHz, audio sampling rates as high as 48 kHz or higher are common in practice.
The Bark scale is defined above in terms of frequency in Hz versus Bark number. For computing optimal bilinear transformations, it is preferable to optimize the fit to the inverse of this map, i.e., Barks versus Hz, so that the mapping error will be measured in Barks rather than Hz.
It can be seen that choosing three specific points and their images determines the mapping for all and .
Bilinear transformations map circles and lines into circles and lines (lines being viewed as circles passing through the point at infinity). In digital audio, where both domains are `` planes,'' we normally want to map the unit circle to itself, with dc mapping to dc ( ) and half the sampling rate mapping to half the sampling rate ( ). Making these substitutions in (E.2) leaves us with transformations of the form
The constant provides one remaining degree of freedom which can be used to map any particular frequency (corresponding to the point on the unit circle) to a new location . All other frequencies will be warped accordingly. Note that this class of ``circle to circle'' bilinear transformations takes the form of the transfer function of an allpass filter. We therefore call it an ``allpass transformation''. The ``allpass coefficient'' can be written in terms of the frequencies and as
In this form, it is clear that is real, and that the inverse of is . Also, since , and for an audio warping (where low frequencies must be ``stretched out'' relative to high frequencies), we have for audio-type mappings from the plane to the plane.
Optimal Bilinear Bark Warping
In the following, a simple direct-form expression is developed for the map parameter giving the best least-squares fit to a Bark scale for a chosen sampling rate. As Fig.E.1 shows, the error is so small that the solution is also very close to the optimal Chebyshev fit. In fact, the optimal warping is within 0.04 Bark of the optimal warping. Since the experimental uncertainty when measuring critical bands is on the order of a tenth of a Bark or more [178,181,251,298], we consider the optimal Chebyshev and least-squares maps to be essentially equivalent psychoacoustically.
Our goal is to find the allpass coefficient such that the frequency mapping
best approximates the Bark scale for a given sampling rate . (Note that the frequencies , , and are all expressed in radians per sample, so that a frequency of half of the sampling rate corresponds to a value of .)
Using squared frequency errors to gauge the fit between and its Bark-warped counterpart, the optimal mapping-parameter may be written as
where represents the norm. (The superscript ` ' denotes optimality in some sense.) Unfortunately, the frequency error
is nonlinear in , and its norm is not easily minimized directly. It turns out, however, that a related error,
has a norm which is more amenable to minimization. The first issue we address is how the minimizers of and are related.
Denote by and the complex representations of the frequencies and on the unit circle,
As seen in Fig.E.2, the absolute frequency error is the arc length between the points and , whereas is the chord length or distance:
The desired arc length error gives more weight to large errors than the chord length error ; however, in the presence of small discrepancies between and , the absolute errors are very similar,
Accordingly, essentially the same results from minimizing or when the fit is uniformly good over frequency.
The error is also nonlinear in the parameter , and to find its norm minimizer, an equation error is introduced, as is common practice in developing solutions to nonlinear system identification problems . Consider mapping the frequency via the allpass transformation ,
Now, multiply (E.3.1) by the denominator , and substitute from (E.3.1), to get
Rearranging terms, we have
where is an equation error defined by
If the weighting matrix is diagonal with kth diagonal element , then the weighted least-squares solution (E.3.1) reduces to
The kth diagonal element of an optimal diagonal weighting matrix is given by 
Note that the desired weighting depends on the unknown map parameter . To overcome this difficulty, we suggest first estimating using , where denotes the identity matrix, and then computing using the weighting (E.3.1) based on the unweighted solution. This is analogous to the Steiglitz-McBride algorithm for converting an equation-error minimizer to the more desired ``output-error'' minimizer using an iteratively computed weight function .
- Minimize the peak arc-length error at each sampling rate to obtain the optimal Chebyshev allpass parameter .
- Minimize the sum of squared arc-length errors to obtain the optimal least-squares allpass parameter .
- Use the closed-form weighted equation-error solution (E.3.1) computed twice, first with , and second with set from (E.3.1) to obtain the optimal ``weighted equation error'' solution .
- Fit the function to the optimal Chebyshev allpass parameter via Chebyshev optimization with respect to . We will refer to the resulting function as the ``arctangent approximation'' (or, less formally, the ``Barktan formula''), and note that it is easily computed directly from the sampling rate.
The peak and rms frequency-mapping errors are plotted versus sampling rate in Fig.E.4. Peak and rms errors in BarksE.1 are plotted for all four cases (Chebyshev, least squares, weighted equation-error, and arctangent approximation). The conformal-map fit to the Bark scale is generally excellent in all cases. We see that the rms error is essentially identical in the first three cases, although the Chebyshev rms error is visibly larger below 10 kHz. Similarly, the peak error is essentially the same for least squares and weighted equation error, with the Chebyshev case being able to shave almost 0.1 Bark from the maximum error at high sampling rates. The arctangent formula shows up to a tenth of a Bark larger peak error at sampling rates 15-30 and 54 kHz, but otherwise it performs very well; at 41 kHz and below 12 kHz the arctangent approximation is essentially optimal in all senses considered.
At sampling rates up to the maximum non-extrapolated sampling rate of kHz, the peak mapping errors are all much less than one Bark (0.64 Barks for the Chebyshev case and 0.67 Barks for the two least squares cases). The mapping errors in Barks can be seen to increase almost linearly with sampling rate. However, the irregular nature of the Bark-scale data results in a nonmonotonic relationship at lower sampling rates.
The specific frequency mapping errors versus frequency at the kHz sampling rate (the same case shown in Fig.E.1) are plotted in Fig.E.5. Again, all four cases are overlaid, and again the least squares and weighted equation-error cases are essentially identical. By forcing equal and opposite peak errors, the Chebyshev case is able to lower the peak error from 0.67 to 0.64 Barks. A difference of 0.03 Barks is probably insignificant for most applications. The peak errors occur at 1.3 kHz and 8.8 kHz where the error is approximately 2/3 Bark. The arctangent formula peak error is 0.73 Barks at 8.8 kHz, but in return, its secondary error peak at 1.3 kHz is only 0.55 Barks. In some applications, such as when working with oversampled signals, higher accuracy at low frequencies at the expense of higher error at very high frequencies may be considered a desirable tradeoff.
We see that the mapping falls ``behind'' a bit as frequency increases from zero to 1.3 kHz, mapping linear frequencies slightly below the desired corresponding Bark values; then, the mapping ``catches up,'' reaching an error of 0 Barks near 3 kHz. Above 3 kHz, it gets ``ahead'' slightly, with frequencies in Hz being mapped a little too high, reaching the positive error peak at 8.8 kHz, after which it falls back down to zero error at . (Recall that dc and half the sampling-rate are always points of zero error by construction.)
The slope of the frequency versus warped-frequency curve can be interpreted as being proportional to critical bandwidth, since a unit interval (one Bark) on the warped-frequency axis is magnified by the slope to restore the band to its original size (one critical bandwidth). It is therefore interesting to look at the relative slope error, i.e., the error in the slope of the frequency mapping divided by the ideal Bark-map slope. We interpret this error measure as the relative bandwidth-mapping error (RBME). The RBME is plotted in Fig.E.6 for a kHz sampling rate. The worst case is 21% for the Chebyshev case and 20% for both least-squares cases. When the mapping coefficient is explicitly optimized to minimize RBME, the results of Fig.E.7 are obtained: the Chebyshev peak error drops from 21% down to 18%, while the least-squares cases remain unchanged at 20% maximum RBME. A 3% change in RBME is comparable to the 0.03 Bark peak-error reduction seen in Fig.E.5 when using the Chebyshev norm instead of the norm; again, such a small difference is not likely to be significant in most applications.
Similar observations are obtained at other sampling rates, as shown in Fig.E.8. Near a 10 kHz sampling rate, the Chebyshev RBME is reduced from 17% when minimizing absolute error in Barks (not shown in any figure) to around 12% by explicitly minimizing the RBME, and this is the sampling-rate range of maximum benefit. At 15.2, 19, 41, and 54 kHz sampling rates, the difference is on the order of only 1%. Other cases generally lie between these extremes. The arctangent formula generally falls between the Chebyshev and optimal least-squares cases, except at the highest (extrapolated) sampling rate 54 kHz. The rms error is very similar in all four cases, although the Chebyshev case has a little larger rms error near a 10 kHz sampling rate, and the arctangent case gives a noticeably larger rms error at 54 kHz.
In one study, young normal listeners exhibited a standard deviation in their measured auditory bandwidths (based on notched-noise masking experiments) on the order of 10% of center frequency . Therefore, a 20% peak error in mapped bandwidth (typical for sampling rates approaching 40 kHz) could be considered significant. However, the range of auditory-filter bandwidths measured in 93 young normal subjects at 2 kHz  was 230 to 410 Hz, which is -26% to +32% relative to 310 Hz. In , 40 subjects were measured, yielding auditory-filter bandwidths between -33% and +65%, with a standard deviation of 18%. It may thus be concluded that a worst-case mapping error on the order of 20%, while probably detectable by ``golden ears'' listeners, lies well within the range of experimental deviations in the empirical measurement of auditory bandwidth.
As a worst-case example of how the 18% peak bandwidth-mapping error in Fig.E.7 might correspond to an audible distortion, consider one critical band of noise centered at the frequency of maximum negative mapping error, scaled to be the same loudness as a single critical band of noise centered at the frequency of maximum positive error. The systematic nature of the mapping error results in a narrowing of the lower band and expansion of the upper band by about 1.7 dB. As a result, over the warped frequency axis, the upper band will be effectively emphasized over the lower band by about 3 dB.
This subsection provides further details on the arctangent approximation for the optimal allpass coefficient as a function of sampling rate. Compared with other spline or polynomial approximations, the arctangent form
was found to provide a more parsimonious expression at a given accuracy level. The idea was that the arctangent function provided a mapping from the interval , the domain of , to the interval , the range of . The additive component allowed to be zero at smaller sampling rates, where the Bark scale is linear with frequency. As an additional benefit, the arctangent expression was easily inverted to give sampling rate in terms of the allpass coefficient :
To obtain the optimal arctangent form , the expression for in (E.3.5) was optimized with respect to its free parameters to match the optimal Chebyshev allpass coefficient as a function of sampling rate:
For a Bark warping, the optimized arctangent formula was found to be
where is expressed in units of kHz. This formula is plotted along with the various optimal curves in Fig.E.3a, and the approximation error is shown in Fig.E.3b. It is extremely accurate below 15 kHz and near 40 kHz, and adds generally less than 0.1 Bark to the peak error at other sampling rates. The rms error versus sampling rate is very close to optimal at all sampling rates, as Fig.E.4 also shows.
When the optimality criterion is chosen to minimize relative bandwidth mapping error (relative map slope error), the arctangent formula optimization yields
The performance of this formula is shown in Fig.E.8. It tends to follow the performance of the optimal least squares map parameter even though the peak parameter error was minimized relative to the optimal Chebyshev map. At 54 kHz there is an additional 3% bandwidth error due to the arctangent approximation, and near 10 kHz the additional error is about 4%; at other sampling rates, the performance of the RBME arctangent approximation is better, and like (E.3.5), it is extremely accurate at 41 kHz.
- warping the desired frequency response, thus ``horizontally stretching'' the more important low-frequency region of the spectrum.
- performing a filter design over the warped frequency axis, and
- transforming the resulting filter to eliminate the frequency warp, returning it to the normal frequency axis.
We conclude discussion of the Bark bilinear transform with the filter design example of Fig.E.9. A th-order pole-zero filter was fit using Prony's method  to the equalization function plotted in the figure as a dashed line. Prony's method was applied normally over a uniformly sampled linear frequency grid in the example of Fig.E.9a, and over an approximate Bark-scale axis in the example of Fig.E.9b. The procedure in the Bark-scale case was as follows :E.2
- The optimal allpass coefficient
was found using
- The desired frequency response
defined on a linear
was warped to an approximate Bark scale
using the Bark bilinear transform,
- A parametric ARMA model
was fit to the desired
Bark-warped frequency response
over the unit circle
- Finally, the inverse Bark bilinear transform was used to ``unwarp'' the modeled system to a linear frequency axis.
Referring to Fig.E.9, it is clear that the warped solution provides a better overall fit than the direct solution which sacrifices accuracy below kHz to achieve a tighter fit above kHz. In some part, the spacing of spectral features is responsible for the success of the Bark-warped model in this particular example. However, we generally recommend using the Bark bilinear transform to design audio filters, since doing so weights the error norm (for norms other than Chebyshev types) in a way which gives equal importance to matching features having equal Bark bandwidths. Even in the case of Chebyshev optimization, auditory warping appears to improve the numerical conditioning of the filter design problem; this applies also to optimization under the Hankel norm which includes an optimal Chebyshev design internally as an intermediate step. Further filter-design examples, including more on the Hankel-norm case, may be found in .
Equivalent Rectangular Bandwidth
It also turns out that a first-order conformal map (bilinear transform) can provide a good match to the ERB scale  as well. Moore and Glasberg  have revised Zwicker's loudness model to better explain (1) how equal-loudness contours change as a function of level, (2) why loudness remains constant as the bandwidth of a fixed-intensity sound increases up to the critical bandwidth, and (3) the loudness of partially masked sounds. The modification that is relevant here is the replacement of the Bark scale by the equivalent rectangular bandwidth (ERB) scale. The ERB of the auditory filter is assumed to be closely related to the critical bandwidth, but it is measured using the notched-noise method [205,206,251,181,87] rather than on classical masking experiments involving a narrow-band masker and probe tone [306,307,304]. As a result, the ERB is said not to be affected by the detection of beats or intermodulation products between the signal and masker. Since this scale is defined analytically, it is also more smoothly behaved than the Bark scale data.
At moderate sound levels, the ERB in Hz is defined by 
where is center-frequency in Hz, normally in the range 100 Hz to 10 kHz. The ERB is generally narrower than the classical critical bandwidth (CB), being about % of center frequency at high frequencies, and leveling off to about Hz at low frequencies. The classical CB, on the other hand, is approximately % of center frequency, leveling off to Hz below Hz. An overlay of ERB and CB bandwidths is shown in Fig.E.10. Also shown is the approximate classical CB bandwidth, as well as a more accurate analytical expression for Bark bandwidth vs. Hz . Finally, note that the frequency interval [ Hz, kHz] corresponds to good agreement between the psychophysical ERB and the directly physical audio filter bandwidths defined in terms of place along the basilar membrane [96, p. 2601].
The ERB scale is defined as the number of ERBs below each frequency
for in Hz . An overlay of the normalized Bark and ERB frequency warpings is shown in Fig.E.11. The ERB warping is determined by scaling the inverse of (E.5), evaluated along a uniform frequency grid from zero to the number of ERBs at half the sampling rate, so that dc maps to zero and half the sampling rate maps to .
Proceeding in the same manner as for the Bark-scale case, allpass coefficients giving a best approximation to the ERB-scale warping were computed for sampling rates near twice the Bark band edge frequencies (chosen to facilitate comparison between the ERB and Bark cases). The resulting optimal map coefficients are shown in Fig.E.12. The allpass parameter increases with increasing sampling rate, as in the Bark-scale case, but it covers a significantly narrower range, as a comparison with Fig.E.3 shows. Also, the Chebyshev solution is now systematically larger than the least-squares solutions, and the least-squares and weighted equation-error cases are no longer essentially identical. The fact that the arctangent formula is optimized for the Chebyshev case is much more evident in the error plot of Fig.E.12b than it was in Fig.E.3b for the Bark warping parameter.
The peak and rms mapping errors are plotted versus sampling rate in Fig.E.13. Compare these results for the ERB scale with those for the Bark scale in Fig.E.4. The ERB map errors are plotted in Barks to facilitate comparison. The rms error of the conformal map fit to the ERB scale increases nearly linearly with log-sampling-rate. The ERB-scale error increases very smoothly with frequency while the Bark-scale error is non-monotonic (see Fig.E.4). The smoother behavior of the ERB errors appears due in part to the fact that the ERB scale is defined analytically while the Bark scale is defined more directly in terms of experimental data: The Bark-scale fit is so good as to be within experimental deviation, while the ERB-scale fit has a much larger systematic error component. The peak error in Fig.E.13 also grows close to linearly on a log-frequency scale and is similarly two to three times the Bark-scale errors of Fig.E.4.
The frequency mapping errors are plotted versus frequency in Fig.E.14 for a sampling rate of kHz. Unlike the Bark-scale case in Fig.E.5, there is now a visible difference between the weighted equation-error and optimal least-squares mappings for the ERB scale. The figure shows also that the peak error when warping to an ERB scale is about three times larger than the peak error when warping to the Bark scale, growing from 0.64 Barks to 1.9 Barks. The locations of the peak errors are also at lower frequencies (moving from 1.3 and 8.8 kHz in the Bark-scale case to 0.7 and 8.2 kHz in the ERB-scale case).
The optimal relative bandwidth-mapping error (RBME) for the ERB case is plotted in Fig.E.15 for a kHz sampling rate. The peak error has grown from close to 20% for the Bark-scale case to more than 60% for the ERB case. Thus, frequency intervals are mapped to the ERB scale with up to three times as much relative error (60%) as when mapping to the Bark scale (20%). The continued narrowing of the auditory filter bandwidth as frequency decreases on the ERB scale results in the conformal map not being able to supply sufficient stretching of the low-frequency axis. The Bark scale case, on the other hand, is much better provided at low frequencies by the first-order conformal map.
Figure E.16 shows the rms and peak ERB RBME as a function of sampling rate. Near a 10 kHz sampling rate, for example, the Chebyshev ERB RBME is increased from 12% in the Bark-scale case to around 37%, again a tripling of the peak error. We can also see in Fig.E.16 that the arctangent formula gives a very good approximation to the optimal Chebyshev solution at all sampling rates. The optimal least-squares and weighted equation-error solutions are quite different, with the weighted equation-error solution moving from being close to the least-squares solution at low sampling rates, to being close to the Chebyshev solution at the higher sampling rates. The rms error is very similar in all four cases, as it was in the Bark-scale case, although the Chebyshev and arctangent formula solutions show noticeable increase in the rms error at low sampling rates where they also show a reduction in peak error by 5% or so.
For an approximation to the optimal Chebyshev ERB frequency mapping, the arctangent formula becomes
where is in kHz. This formula is plotted along with the various optimal curves in Fig.E.12a, and the approximation error is shown in Fig.E.12b. The performance of the arctangent approximation can be seen in Fig.E.13.
The performance of this formula is shown in Fig.E.16. It follows the optimal Chebyshev map parameter very well.
Audio conformal maps can be adjusted by using a more general error weighting versus frequency. For example, the weighting can be set to zero above some frequency limit along the unit circle. A more general weighting can also be used to obtain improved accuracy in specific desired frequency ranges. Again, these refinements would seem to be of interest primarily for the ERB-scale and other mappings, since the Bark-scale warping is excellent already. The diagonal weighting matrix in the weighted equation error solution (E.3.1) can be multiplied by any desired application-dependent weighting.
As another variation, an auditory frequency scale could be defined based on the cochlear frequency-to-place function . In this case, a close relationship still exists between equal-place increments along the basilar membrane and equal bandwidth increments in the defined audio filter bank. Preliminary comparisons [96, Fig. 9] indicate that the first-order conformal map errors for this case are qualitatively between the ERB and Bark-scale cases. The first-order conformal map works best when the auditory filter bandwidths level off to a minimum width at low frequencies, as they do in the Bark-scale case below Hz. Thus, the question of the ``audio fidelity'' of the first-order conformal map is directly tied to the question of what is really the best frequency resolution to provide at low frequencies in the auditory filter bank.
The first-order ``allpass'' conformal map which maps the unit circle to itself was configured to approximate frequency warpings from a linear frequency scale to either a Bark scale or an ERB frequency scale for a wide variety of sampling rates. The accuracy of this warping is extremely good for the Bark-scale case, and fair also for the ERB case; the first-order conformal map shows significantly more error in the ERB case (about three times that of the Bark-scale case) due to its narrower resolution bandwidths at low frequencies.
A closed-form expression was derived for the allpass coefficient which minimizes the norm of the weighted equation error between samples of the allpass warping and the desired Bark or ERB warpings. The weighting function was designed to give estimates as close as possible to the optimal least-squares estimate, and comparisons showed this to be well achieved, especially in the Bark-scale case.
A simple, closed-form, invertible expression which comes very close to the optimal Chebyshev allpass coefficient vs. sampling rate was given in (E.3.5) for the Bark-scale case and in (E.5.2) for the ERB-scale case.
Three optimal conformal maps were defined based on Chebyshev, least squares, and weighted equation-error approximation, and all three mappings were found to be psychoacoustically identical, for most practical purposes, in the Bark-scale case. When using optimal maps, the peak relative bandwidth mapping error is about % in the Bark-scale case and % in the ERB-scale case.
We conclude that the first-order conformal map is a highly useful tool
for audio digital filter design and related applications in digital
audio signal processing which may benefit from an order-invariant
mapping of the unit circle from a linear frequency scale to an
approximate auditory frequency scale.
Matlab code for plots, optimizations, and the filter design example
presented here are available online at
Examples in Matlab and Octave
Gaussian Function Properties