Bilinear Frequency-Warping for Audio Spectrum Analysis over Bark and ERB Frequency Scales

With the increasing use of frequency-domain techniques in audio signal processing applications such as audio compression, there is increasing emphasis on psychoacoustic-based spectral measures [274,17,113,118]. In particular, frequency warping is an important tool in spectral audio signal processing. For example, audio spectrograms (Chapter 7) can display signal energy versus time over a more perceptual, nonuniform, audio frequency axis (§7.3). Also, methods for digital filter design (Chapter 4) having no weighting function versus frequency, such as linear predictive coding (LPC) (§10.3), can be given an effective weighting function by means of frequency warping [278].

A common choice of audio frequency warping in audio applications is from a linear frequency scale to a Bark frequency scale (also called ``critical band rate'') [306,307,304,179,102,269]. The Bark scale is defined so that critical bands of hearing are uniformly spaced. (One critical bandwidth equals one Bark.)

A more recently developed psychoacoustic frequency scale, called the Equivalent Rectangular Bandwidth (ERB) scale [88], is based on different psychoacoustic experiments resulting in generally narrower critical bandwidth estimates.

This appendix, condensed from [269,268], describes a useful class approximate Bark/ERB frequency warpings that may be implemented using a bilinear transform (first-order conformal map of the unit circle to itself in the $ z$ plane). Such warpings preserve order in filter-design applications. That is, the warping can be undone by the inverse bilinear transform which, because its first order, does not change the order of the filter that was designed over the warped frequency axis.

The Bark Frequency Scale

Based on the results of many psychoacoustic experiments, the Bark scale is defined so that the critical bands of human hearing each have a width of one Bark. By representing spectral energy (in dB) over the Bark scale, a closer correspondence is obtained with spectral information processing in the ear (§7.3).

The Bark scale ranges from 1 to 24 Barks, corresponding to the first 24 critical bands of hearing [304]. The published Bark band edges are given in Hertz as [0, 100, 200, 300, 400, 510, 630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700, 3150, 3700, 4400, 5300, 6400, 7700, 9500, 12000, 15500]. The published band centers in Hertz are [50, 150, 250, 350, 450, 570, 700, 840, 1000, 1170, 1370, 1600, 1850, 2150, 2500, 2900, 3400, 4000, 4800, 5800, 7000, 8500, 10500, 13500]. These center-frequencies and bandwidths are to be interpreted as samplings of a continuous variation in the frequency response of the ear to a sinusoid or narrow-band noise process. That is, critical-band-shaped masking patterns should be seen as forming around specific stimuli in the ear rather than being associated with a specific fixed filter bank in the ear.

Note that since the Bark scale is defined only up to 15.5 kHz, the highest sampling rate for which the Bark scale is defined up to the Nyquist limit, without requiring extrapolation, is 31 kHz. The 25th Bark band certainly extends above 19 kHz (the sum of the 24th Bark band edge and the 23rd critical bandwidth), so that a sampling rate of 40 kHz is implicitly supported by the data. We have extrapolated the Bark band-edges in our work, appending the values [20500, 27000] so that sampling rates up to 54 kHz are defined. While human hearing generally does not extend above 20 kHz, audio sampling rates as high as 48 kHz or higher are common in practice.

The Bark scale is defined above in terms of frequency in Hz versus Bark number. For computing optimal bilinear transformations, it is preferable to optimize the fit to the inverse of this map, i.e., Barks versus Hz, so that the mapping error will be measured in Barks rather than Hz.


The Bilinear Transform

The formula for a general first-order (bilinear) conformal mapping of functions of a complex variable is conveniently expressed by [42, page 75]

$\displaystyle {(\zeta -\zeta _1)(\zeta _2-\zeta _3) \over
(\zeta _2-\zeta _1)(\zeta -\zeta _3)} =
{(z-z_1)(z_2-z_3) \over
(z_2-z_1)(z-z_3)}.
$

It can be seen that choosing three specific points and their images determines the mapping for all $ z$ and $ \zeta $ .

Bilinear transformations map circles and lines into circles and lines (lines being viewed as circles passing through the point at infinity). In digital audio, where both domains are ``$ z$ planes,'' we normally want to map the unit circle to itself, with dc mapping to dc ( $ z_1=\zeta _1=1$ ) and half the sampling rate mapping to half the sampling rate ( $ z_2=\zeta _2=-1$ ). Making these substitutions in (E.2) leaves us with transformations of the form

$\displaystyle z= {\cal A}_{\rho }(\zeta ) = {\zeta + \rho \over 1 + \zeta \rho } , \qquad \rho = {\zeta _3 - z_3 \over 1 - z_3\zeta _3}.$ (E.1)

The constant $ \rho $ provides one remaining degree of freedom which can be used to map any particular frequency $ \omega$ (corresponding to the point $ e^{j\omega }$ on the unit circle) to a new location $ a(\omega )$ . All other frequencies will be warped accordingly. Note that this class of ``circle to circle'' bilinear transformations takes the form of the transfer function of an allpass filter. We therefore call it an ``allpass transformation''. The ``allpass coefficient'' $ \rho $ can be written in terms of the frequencies $ \omega$ and $ a(\omega )$ as

$\displaystyle \rho = {\sin\{[a(\omega )-\omega ]/2\} \over \sin\{[a(\omega )+\omega ]/2\} }.$ (E.2)

In this form, it is clear that $ \rho $ is real, and that the inverse of $ {\cal A}_{\rho }$ is $ {\cal A}_{-\rho }$ . Also, since $ 0\leq\{\omega ,a(\omega )\}\leq\pi$ , and $ a(\omega )\geq\omega $ for an audio warping (where low frequencies must be ``stretched out'' relative to high frequencies), we have $ \rho \in[0,1)$ for audio-type mappings from the $ z$ plane to the $ \zeta $ plane.


Optimal Bilinear Bark Warping

It turns out that a first-order conformal map (bilinear transform) can provide a surprisingly close match to the Bark frequency scale [268,269]. This is shown in Fig.E.1.

Figure: Bark and allpass frequency warpings at a sampling rate of $ 31$ kHz (the highest possible without extrapolating the published Bark scale bandlimits). a) Bark frequency warping viewed as a conformal mapping of the interval $ [0,\pi ]$ to itself on the unit circle. b) Same mapping interpreted as an auditory frequency warping from Hz to Barks; the legend shown in plot a) also applies to plot b). The legend additionally displays the optimal allpass parameter $ \rho $ used for each map. The discrete band-edges which define the Bark scale are plotted as circles. The optimal Chebyshev (solid), least-squares (dashed), and weighted equation-error (dot-dashed) allpass parameters produce mappings which are nearly identical. Also plotted (dotted) is the mapping based on an allpass parameter given by an analytic expression in terms of the sampling rate, which will be described. It should be pointed out that the fit improves as the sampling rate is decreased.
\includegraphics[width=\twidth]{eps/fitlogf}

In the following, a simple direct-form expression is developed for the map parameter $ \rho $ giving the best least-squares fit to a Bark scale for a chosen sampling rate. As Fig.E.1 shows, the error is so small that the solution is also very close to the optimal Chebyshev fit. In fact, the $ L2$ optimal warping is within 0.04 Bark of the $ L-infinity$ optimal warping. Since the experimental uncertainty when measuring critical bands is on the order of a tenth of a Bark or more [178,181,251,298], we consider the optimal Chebyshev and least-squares maps to be essentially equivalent psychoacoustically.

Computing $ \rho $

Our goal is to find the allpass coefficient $ \rho $ such that the frequency mapping

$\displaystyle a(\omega )=$   angle$\displaystyle \left\{{\cal A}_{-\rho }(e^{j\omega }) \right\}
$

best approximates the Bark scale $ b(\omega )$ for a given sampling rate $ f_s$ . (Note that the frequencies $ \omega$ , $ a(\omega )$ , and $ b(\omega )$ are all expressed in radians per sample, so that a frequency of half of the sampling rate corresponds to a value of $ \pi$ .)

Using squared frequency errors to gauge the fit between $ a(\omega )$ and its Bark-warped counterpart, the optimal mapping-parameter $ \rho ^*$ may be written as

$\displaystyle \rho ^*= \hbox{Arg}\left[\min_{\rho }\left\{\left\Vert\,a(\omega )- b(\omega )\,\right\Vert\right\}\right],
$

where $ \left\Vert\,\cdot\,\right\Vert$ represents the $ L2$ norm. (The superscript `$ \ast$ ' denotes optimality in some sense.) Unfortunately, the frequency error

$\displaystyle \epsilon _{\hbox{\tiny A}}\isdef a(\omega )- b(\omega )
$

is nonlinear in $ \rho $ , and its norm is not easily minimized directly. It turns out, however, that a related error,

$\displaystyle \epsilon _{\hbox{\tiny C}}\isdef e^{ja(\omega )}- e^{jb(\omega )},
$

has a norm which is more amenable to minimization. The first issue we address is how the minimizers of $ \left\Vert\,\epsilon _{\hbox{\tiny A}}\,\right\Vert$ and $ \left\Vert\,\epsilon _{\hbox{\tiny C}}\,\right\Vert$ are related.

Figure E.2: Frequency Map Errors
\includegraphics[width=3in]{eps/eaec}

Denote by $ \zeta $ and $ \beta $ the complex representations of the frequencies $ a(\omega )$ and $ b(\omega )$ on the unit circle,

$\displaystyle \zeta = e^{ja(\omega )}, \qquad \beta = e^{jb(\omega )}.
$

As seen in Fig.E.2, the absolute frequency error $ \vert\epsilon _{\hbox{\tiny A}}\vert$ is the arc length between the points $ \zeta $ and $ \beta $ , whereas $ \vert\epsilon _{\hbox{\tiny C}}\vert$ is the chord length or distance:

$\displaystyle \vert\epsilon _{\hbox{\tiny C}}\vert = 2\sin(\vert\epsilon _{\hbox{\tiny A}}\vert/2).
$

The desired arc length error $ \epsilon _{\hbox{\tiny A}}$ gives more weight to large errors than the chord length error $ \epsilon _{\hbox{\tiny C}}$ ; however, in the presence of small discrepancies between $ \zeta $ and $ \beta $ , the absolute errors are very similar,

$\displaystyle \vert\epsilon _{\hbox{\tiny C}}\vert \approx \vert\epsilon _{\hbox{\tiny A}}\vert, \quad \mbox{when } \vert\epsilon _{\hbox{\tiny A}}\vert\ll1.
$

Accordingly, essentially the same $ \rho ^*$ results from minimizing $ \left\Vert\,\epsilon _{\hbox{\tiny A}}\,\right\Vert$ or $ \left\Vert\,\epsilon _{\hbox{\tiny C}}\,\right\Vert$ when the fit is uniformly good over frequency.

The error $ \epsilon _{\hbox{\tiny C}}$ is also nonlinear in the parameter $ \rho $ , and to find its norm minimizer, an equation error is introduced, as is common practice in developing solutions to nonlinear system identification problems [152]. Consider mapping the frequency $ z=e^{j\omega}$ via the allpass transformation $ {\cal A}_{-\rho }(z)$ ,

$\displaystyle \zeta = {z- \rho \over 1- z\rho }.
$

Now, multiply (E.3.1) by the denominator $ (1-z\rho )$ , and substitute $ \zeta =\beta +\epsilon _{\hbox{\tiny C}}$ from (E.3.1), to get

$\displaystyle (\beta + \epsilon _{\hbox{\tiny C}}) (1 - z\rho ) = z- \rho .
$

Rearranging terms, we have

$\displaystyle (\beta - z) - (\beta z- 1) \rho = \epsilon _{\hbox{\tiny E}},
$

where $ \epsilon _{\hbox{\tiny E}}$ is an equation error defined by

$\displaystyle \epsilon _{\hbox{\tiny E}}\isdef (z\rho - 1) \epsilon _{\hbox{\tiny C}}.
$

It is shown in [269] that the optimal weighted least-squares conformal map parameter estimate is given by

$\displaystyle \rho ^*= {\hbox{\boldmath$s$}^\top \hbox{\boldmath$V$}\hbox{\boldmath$d$}\over \hbox{\boldmath$s$}^\top \hbox{\boldmath$V$}\hbox{\boldmath$s$}} .
$

If the weighting matrix $ \hbox{\boldmath $V$}$ is diagonal with kth diagonal element $ v(\omega_{k})>0$ , then the weighted least-squares solution (E.3.1) reduces to
$\displaystyle \rho ^*$ $\displaystyle =$ $\displaystyle \frac{\sum_{k=1}^K
v(\omega _k)
\sin\left[\frac{b(\omega_{k})-\omega_{k}}{2}\right]
\sin\left[\frac{b(\omega_{k})+\omega_{k}}{2}\right]
}{\sum_{k=1}^K
v(\omega _k)\sin^2\left[\frac{b(\omega_{k})+\omega_{k}}{2}\right]}$  
  $\displaystyle =$ $\displaystyle \frac{\sum_{k=1}^{K}
v(\omega _k)
\left\{\cos\left[b(\omega_{k})\right]- \cos(\omega_{k})\right\}}{%
\sum_{k=1}^{K} v(\omega_{k}) \left\{\cos\left[b(\omega_{k}) + \omega_{k}\right]- 1\right\}}.$  

The kth diagonal element of an optimal diagonal weighting matrix $ \hbox{\boldmath $V$}$ is given by [269]

$\displaystyle v(\omega_{k}) = {1\over 1 + \rho ^2 - 2\rho \cos\omega_{k}},
$

Note that the desired weighting depends on the unknown map parameter $ \rho $ . To overcome this difficulty, we suggest first estimating $ \rho ^*$ using $ \hbox{\boldmath $V$}= \hbox{\boldmath $I$}$ , where $ \hbox{\boldmath $I$}$ denotes the identity matrix, and then computing $ \rho ^*$ using the weighting (E.3.1) based on the unweighted solution. This is analogous to the Steiglitz-McBride algorithm for converting an equation-error minimizer to the more desired ``output-error'' minimizer using an iteratively computed weight function [151].


Optimal Frequency Warpings

In [269], optimal allpass coefficients $ \rho ^*$ were computed for sampling rates of twice the Bark band-edge frequencies by means of four different optimization methods:

  1. Minimize the peak arc-length error $ \left\Vert\,\epsilon _{\hbox{\tiny A}}\,\right\Vert _\infty$ at each sampling rate to obtain the optimal Chebyshev allpass parameter $ \rho ^*_\infty (f_s)$ .
  2. Minimize the sum of squared arc-length errors $ \left\Vert\,\epsilon _{\hbox{\tiny A}}\,\right\Vert _2^2$ to obtain the optimal least-squares allpass parameter $ \rho ^*_2(f_s)$ .
  3. Use the closed-form weighted equation-error solution (E.3.1) computed twice, first with $ \hbox{\boldmath $V$}= \hbox{\boldmath $I$}$ , and second with $ \hbox{\boldmath $V$}$ set from (E.3.1) to obtain the optimal ``weighted equation error'' solution $ \rho ^*_{\hbox{\sc E}}(f_s)$ .
  4. Fit the function $ \gamma_1\left[{2\over\pi}\arctan(\gamma_2f_s)\right]^{{1\over2}}+\gamma_3 $ to the optimal Chebyshev allpass parameter $ \rho ^*_\infty (f_s)$ via Chebyshev optimization with respect to $ {\mathbf\gamma}\isdef \{\gamma_1,\gamma_2,\gamma_3\}$ . We will refer to the resulting function as the ``arctangent approximation'' $ \rho ^*_{\mathbf\gamma}(f_s)$ (or, less formally, the ``Barktan formula''), and note that it is easily computed directly from the sampling rate.
In all cases, the error minimized was in units proportional to Barks. The discrete frequency grid in all cases was taken to be the Bark band-edges given in §E.1. The resulting allpass coefficients are plotted as a function of sampling rate in Fig.E.3.

Figure: a) Optimal allpass coefficients $ \rho ^*_\infty $ , $ \rho ^*_2$ , and $ \rho ^*_{\hbox {\sc E}}$ , plotted as a function of sampling rate $ f_s$ . Also shown is the arctangent approximation $ \rho ^*_{\mathbf\gamma}=1.0674\sqrt{(2/\pi)\arctan(0.06583f_s)}-0.1916$ . b) Same as a) with the arctangent approximation subtracted out. Note the nearly identical behavior of optimal least-squares (plus signs) and weighted equation-error (circles).
\includegraphics[width=\twidth]{eps/pfs}

Figure E.4: Root-mean-square and peak frequency-mapping errors versus sampling rate for Chebyshev, least squares, weighted equation-error, and arctangent optimal maps. The rms errors are nearly coincident along the lower line, while the peak errors a little more spread out well above the rms errors.
\includegraphics[width=\twidth]{eps/rmspkerr}

The peak and rms frequency-mapping errors are plotted versus sampling rate in Fig.E.4. Peak and rms errors in BarksE.1 are plotted for all four cases (Chebyshev, least squares, weighted equation-error, and arctangent approximation). The conformal-map fit to the Bark scale is generally excellent in all cases. We see that the rms error is essentially identical in the first three cases, although the Chebyshev rms error is visibly larger below 10 kHz. Similarly, the peak error is essentially the same for least squares and weighted equation error, with the Chebyshev case being able to shave almost 0.1 Bark from the maximum error at high sampling rates. The arctangent formula shows up to a tenth of a Bark larger peak error at sampling rates 15-30 and 54 kHz, but otherwise it performs very well; at 41 kHz and below 12 kHz the arctangent approximation is essentially optimal in all senses considered.

At sampling rates up to the maximum non-extrapolated sampling rate of $ 31$ kHz, the peak mapping errors are all much less than one Bark (0.64 Barks for the Chebyshev case and 0.67 Barks for the two least squares cases). The mapping errors in Barks can be seen to increase almost linearly with sampling rate. However, the irregular nature of the Bark-scale data results in a nonmonotonic relationship at lower sampling rates.

Figure: Frequency mapping errors versus frequency for a sampling rate of $ 31$ kHz.
\includegraphics[width=\twidth]{eps/fme}

The specific frequency mapping errors versus frequency at the $ 31$ kHz sampling rate (the same case shown in Fig.E.1) are plotted in Fig.E.5. Again, all four cases are overlaid, and again the least squares and weighted equation-error cases are essentially identical. By forcing equal and opposite peak errors, the Chebyshev case is able to lower the peak error from 0.67 to 0.64 Barks. A difference of 0.03 Barks is probably insignificant for most applications. The peak errors occur at 1.3 kHz and 8.8 kHz where the error is approximately 2/3 Bark. The arctangent formula peak error is 0.73 Barks at 8.8 kHz, but in return, its secondary error peak at 1.3 kHz is only 0.55 Barks. In some applications, such as when working with oversampled signals, higher accuracy at low frequencies at the expense of higher error at very high frequencies may be considered a desirable tradeoff.

We see that the mapping falls ``behind'' a bit as frequency increases from zero to 1.3 kHz, mapping linear frequencies slightly below the desired corresponding Bark values; then, the mapping ``catches up,'' reaching an error of 0 Barks near 3 kHz. Above 3 kHz, it gets ``ahead'' slightly, with frequencies in Hz being mapped a little too high, reaching the positive error peak at 8.8 kHz, after which it falls back down to zero error at $ z=e^{j\pi}$ . (Recall that dc and half the sampling-rate are always points of zero error by construction.)

Figure: Relative bandwidth mapping error (RBME) for a $ 31$ kHz sampling rate using the optimized allpass warpings of Fig.E.3 at $ 31$ kHz. The optimal Chebyshev, least squares, and weighted equation-error cases are almost indistinguishable.
\includegraphics[width=\twidth]{eps/rbe}


Bark Relative Bandwidth Mapping Error

Figure: RBME for a $ 31$ kHz sampling rate, with explicit minimization of RBME in the optimizations.
\includegraphics[width=\twidth]{eps/rbeslp}

The slope of the frequency versus warped-frequency curve can be interpreted as being proportional to critical bandwidth, since a unit interval (one Bark) on the warped-frequency axis is magnified by the slope to restore the band to its original size (one critical bandwidth). It is therefore interesting to look at the relative slope error, i.e., the error in the slope of the frequency mapping divided by the ideal Bark-map slope. We interpret this error measure as the relative bandwidth-mapping error (RBME). The RBME is plotted in Fig.E.6 for a $ 31$ kHz sampling rate. The worst case is 21% for the Chebyshev case and 20% for both least-squares cases. When the mapping coefficient is explicitly optimized to minimize RBME, the results of Fig.E.7 are obtained: the Chebyshev peak error drops from 21% down to 18%, while the least-squares cases remain unchanged at 20% maximum RBME. A 3% change in RBME is comparable to the 0.03 Bark peak-error reduction seen in Fig.E.5 when using the Chebyshev norm instead of the $ L2$ norm; again, such a small difference is not likely to be significant in most applications.

Figure E.8: Root-mean-square and peak relative-bandwidth-mapping errors versus sampling rate for Chebyshev, least squares, weighted equation-error, and arctangent optimal maps, with explicit minimization of RBME used in all optimizations. The peak errors form a group lying well above the lower lying rms group.
\includegraphics[width=\twidth]{eps/pkrbmeslp}

Similar observations are obtained at other sampling rates, as shown in Fig.E.8. Near a 10 kHz sampling rate, the Chebyshev RBME is reduced from 17% when minimizing absolute error in Barks (not shown in any figure) to around 12% by explicitly minimizing the RBME, and this is the sampling-rate range of maximum benefit. At 15.2, 19, 41, and 54 kHz sampling rates, the difference is on the order of only 1%. Other cases generally lie between these extremes. The arctangent formula generally falls between the Chebyshev and optimal least-squares cases, except at the highest (extrapolated) sampling rate 54 kHz. The rms error is very similar in all four cases, although the Chebyshev case has a little larger rms error near a 10 kHz sampling rate, and the arctangent case gives a noticeably larger rms error at 54 kHz.


Error Significance

In one study, young normal listeners exhibited a standard deviation in their measured auditory bandwidths (based on notched-noise masking experiments) on the order of 10% of center frequency [178]. Therefore, a 20% peak error in mapped bandwidth (typical for sampling rates approaching 40 kHz) could be considered significant. However, the range of auditory-filter bandwidths measured in 93 young normal subjects at 2 kHz [178] was 230 to 410 Hz, which is -26% to +32% relative to 310 Hz. In [298], 40 subjects were measured, yielding auditory-filter bandwidths between -33% and +65%, with a standard deviation of 18%. It may thus be concluded that a worst-case mapping error on the order of 20%, while probably detectable by ``golden ears'' listeners, lies well within the range of experimental deviations in the empirical measurement of auditory bandwidth.

As a worst-case example of how the 18% peak bandwidth-mapping error in Fig.E.7 might correspond to an audible distortion, consider one critical band of noise centered at the frequency of maximum negative mapping error, scaled to be the same loudness as a single critical band of noise centered at the frequency of maximum positive error. The systematic nature of the mapping error results in a narrowing of the lower band and expansion of the upper band by about 1.7 dB. As a result, over the warped frequency axis, the upper band will be effectively emphasized over the lower band by about 3 dB.


Arctangent Approximations for $ \rho ^*(f_s)$

This subsection provides further details on the arctangent approximation for the optimal allpass coefficient as a function of sampling rate. Compared with other spline or polynomial approximations, the arctangent form

$\displaystyle \rho _{\mathbf\gamma}(f_s) \isdef \max\left\{0,\gamma_1\left[{2\over\pi}\arctan(\gamma_2f_s)\right]^{{1\over2}}+\gamma_3 \right\}
$

was found to provide a more parsimonious expression at a given accuracy level. The idea was that the arctangent function provided a mapping from the interval $ [0,\infty)$ , the domain of $ f_s$ , to the interval $ [0,1)$ , the range of $ \rho (f_s)$ . The additive component $ \gamma_3$ allowed $ \rho _{\mathbf\gamma}(f_s)$ to be zero at smaller sampling rates, where the Bark scale is linear with frequency. As an additional benefit, the arctangent expression was easily inverted to give sampling rate $ f_s$ in terms of the allpass coefficient $ \rho _{\mathbf\gamma}$ :

$\displaystyle f_s= {1\over \gamma_2}\tan\left[{\pi\over2} \left(\frac{\rho _{\mathbf\gamma}- \gamma_3}{\gamma_1}\right)^2\right].
$

To obtain the optimal arctangent form $ \rho ^*_{\mathbf\gamma}(f_s)$ , the expression for $ \rho _{\mathbf\gamma}(f_s)$ in (E.3.5) was optimized with respect to its free parameters $ {\mathbf\gamma}=\{\gamma_1,\gamma_2,\gamma_3\}$ to match the optimal Chebyshev allpass coefficient as a function of sampling rate:

$\displaystyle \rho ^*_{\mathbf\gamma}(f_s) \isdef \hbox{Arg}\left[\min_{{\mathbf\gamma}}\left\{\left\Vert\,\rho ^*_\infty(f_s) - \rho _{\mathbf\gamma}(f_s)\,\right\Vert _\infty\right\}\right].
$

For a Bark warping, the optimized arctangent formula was found to be

$\displaystyle \rho ^*_{\mathbf\gamma}(f_s) = 1.0674\left[{2\over\pi}\arctan(0.06583f_s)\right]^{{1\over2}}-0.1916,
$

where $ f_s$ is expressed in units of kHz. This formula is plotted along with the various optimal $ \rho ^*$ curves in Fig.E.3a, and the approximation error is shown in Fig.E.3b. It is extremely accurate below 15 kHz and near 40 kHz, and adds generally less than 0.1 Bark to the peak error at other sampling rates. The rms error versus sampling rate is very close to optimal at all sampling rates, as Fig.E.4 also shows.

When the optimality criterion is chosen to minimize relative bandwidth mapping error (relative map slope error), the arctangent formula optimization yields

$\displaystyle \rho ^*_{\mathbf\gamma}(f_s) = 1.0480\left[{2\over\pi}\arctan(0.07212f_s)\right]^{{1\over2}}-0.1957.
$

The performance of this formula is shown in Fig.E.8. It tends to follow the performance of the optimal least squares map parameter even though the peak parameter error was minimized relative to the optimal Chebyshev map. At 54 kHz there is an additional 3% bandwidth error due to the arctangent approximation, and near 10 kHz the additional error is about 4%; at other sampling rates, the performance of the RBME arctangent approximation is better, and like (E.3.5), it is extremely accurate at 41 kHz.


Application to Audio Filter Design

Frequency warping is generally employed in audio filter design by

  1. warping the desired frequency response, thus ``horizontally stretching'' the more important low-frequency region of the spectrum.
  2. performing a filter design over the warped frequency axis, and
  3. transforming the resulting filter to eliminate the frequency warp, returning it to the normal frequency axis.
The third step may be carried out using a conformal map (i.e., substituting some rational-function-of-$ z$ for $ z$ in the filter transfer function). Since bilinear-transform frequency-mappings are first order, when the resulting filter transformed back to unwarped form, its order remains the same [258].

Filter Design Example

Figure: Filter Design Example: Overlay of measured and modeled magnitude transfer functions, where the model is a $ 12$ th-order filter designed by Prony's method. a) Results without prewarping of the frequency axis. b) Results using the Bark bilinear transform prewarping.
\includegraphics[width=\twidth]{eps/fd}

We conclude discussion of the Bark bilinear transform with the filter design example of Fig.E.9. A $ 12$ th-order pole-zero filter was fit using Prony's method [162] to the equalization function plotted in the figure as a dashed line. Prony's method was applied normally over a uniformly sampled linear frequency grid in the example of Fig.E.9a, and over an approximate Bark-scale axis in the example of Fig.E.9b. The procedure in the Bark-scale case was as follows [258]:E.2

  1. The optimal allpass coefficient $ \rho ^*_{\mathbf\gamma}(f_s)$ was found using (E.3.5).

  2. The desired frequency response $ H(e^{j\omega})$ defined on a linear frequency axis $ \omega$ was warped to an approximate Bark scale $ a(\omega )$ using the Bark bilinear transform, $ {\tilde H}(e^{j\omega }) \isdef H[{\cal A}_{\rho }(e^{ja(\omega )})]$ .

  3. A parametric ARMA model $ {\tilde H}^*(\zeta )$ was fit to the desired Bark-warped frequency response $ {\tilde H}(e^{j\omega })$ over the unit circle $ \zeta =e^{j\omega }$ .

  4. Finally, the inverse Bark bilinear transform $ \zeta \leftarrow {\cal A}_{-\rho }(z)$ was used to ``unwarp'' the modeled system to a linear frequency axis.

Referring to Fig.E.9, it is clear that the warped solution provides a better overall fit than the direct solution which sacrifices accuracy below $ 4$  kHz to achieve a tighter fit above $ 10$  kHz. In some part, the spacing of spectral features is responsible for the success of the Bark-warped model in this particular example. However, we generally recommend using the Bark bilinear transform to design audio filters, since doing so weights the error norm (for norms other than Chebyshev types) in a way which gives equal importance to matching features having equal Bark bandwidths. Even in the case of Chebyshev optimization, auditory warping appears to improve the numerical conditioning of the filter design problem; this applies also to optimization under the Hankel norm which includes an optimal Chebyshev design internally as an intermediate step. Further filter-design examples, including more on the Hankel-norm case, may be found in [258].


Equivalent Rectangular Bandwidth

It also turns out that a first-order conformal map (bilinear transform) can provide a good match to the ERB scale [269] as well. Moore and Glasberg [177] have revised Zwicker's loudness model to better explain (1) how equal-loudness contours change as a function of level, (2) why loudness remains constant as the bandwidth of a fixed-intensity sound increases up to the critical bandwidth, and (3) the loudness of partially masked sounds. The modification that is relevant here is the replacement of the Bark scale by the equivalent rectangular bandwidth (ERB) scale. The ERB of the auditory filter is assumed to be closely related to the critical bandwidth, but it is measured using the notched-noise method [205,206,251,181,87] rather than on classical masking experiments involving a narrow-band masker and probe tone [306,307,304]. As a result, the ERB is said not to be affected by the detection of beats or intermodulation products between the signal and masker. Since this scale is defined analytically, it is also more smoothly behaved than the Bark scale data.

Figure: Bark critical bandwidth and equivalent rectangular bandwidth as a function of frequency. Also plotted is the classical rule of thumb that a critical band is 100 Hz wide for center frequencies below 500 Hz, and 20% of the center frequency above 500 Hz. Also plotted is the empirically determined formula, CB bandwidth in Hz $ \approx 94+71f^{3/2}$ , with $ f$ in kHz [37]. The ERBs are computed from (E.5), and the Bark CB bandwidths were computed by differencing the band-edge frequencies listed in §E.1, plotting each difference over its corresponding band center (also listed in §E.1).
\includegraphics[width=\twidth]{eps/erbbark}

At moderate sound levels, the ERB in Hz is defined by [177]

   ERB$\displaystyle (f) = 0.108 f + 24.7
$

where $ f$ is center-frequency in Hz, normally in the range 100 Hz to 10 kHz. The ERB is generally narrower than the classical critical bandwidth (CB), being about $ 11$ % of center frequency at high frequencies, and leveling off to about $ 25$ Hz at low frequencies. The classical CB, on the other hand, is approximately $ 20$ % of center frequency, leveling off to $ 100$ Hz below $ 500$ Hz. An overlay of ERB and CB bandwidths is shown in Fig.E.10. Also shown is the approximate classical CB bandwidth, as well as a more accurate analytical expression for Bark bandwidth vs. Hz [3]. Finally, note that the frequency interval [$ 400$ Hz, $ 6.5$ kHz] corresponds to good agreement between the psychophysical ERB and the directly physical audio filter bandwidths defined in terms of place along the basilar membrane [96, p. 2601].

Figure: Bark and ERB frequency warpings for a sampling rate of $ 31$  kHz. a) Linear input frequency scale. b) Log input frequency scale. Note that sampling is uniform across the vertical axis (corresponding to the desired audio frequency scale). As a result, the plotted samples align horizontally rather than vertically.
\includegraphics[width=\twidth]{eps/erbbarkm}

The ERB scale is defined as the number of ERBs below each frequency

   ERBS$\displaystyle (f) = 21.4 \log_{10}(0.00437 f + 1)
$

for $ f$ in Hz [177]. An overlay of the normalized Bark and ERB frequency warpings is shown in Fig.E.11. The ERB warping is determined by scaling the inverse of (E.5), evaluated along a uniform frequency grid from zero to the number of ERBs at half the sampling rate, so that dc maps to zero and half the sampling rate maps to $ \pi$ .

Proceeding in the same manner as for the Bark-scale case, allpass coefficients giving a best approximation to the ERB-scale warping were computed for sampling rates near twice the Bark band edge frequencies (chosen to facilitate comparison between the ERB and Bark cases). The resulting optimal map coefficients are shown in Fig.E.12. The allpass parameter increases with increasing sampling rate, as in the Bark-scale case, but it covers a significantly narrower range, as a comparison with Fig.E.3 shows. Also, the Chebyshev solution is now systematically larger than the least-squares solutions, and the least-squares and weighted equation-error cases are no longer essentially identical. The fact that the arctangent formula is optimized for the Chebyshev case is much more evident in the error plot of Fig.E.12b than it was in Fig.E.3b for the Bark warping parameter.

Figure: a) Optimal allpass coefficients $ \rho ^*$ for the ERB case, plotted as a function of sampling rate $ f_s$ . Also shown is the arctangent approximation. b) Same as a) with the arctangent formula subtracted out.
\includegraphics[width=\twidth]{eps/pfserb}

Figure E.13: Root-mean-square and peak frequency-mapping errors (conformal map minus ERB) versus sampling rate for Chebyshev, least squares, weighted equation-error, and arctangent optimal maps. The rms errors are nearly coincident along the lower line, while the peak errors form an upper group well above the rms errors.
\includegraphics[width=\twidth]{eps/rmspkerrerb}

The peak and rms mapping errors are plotted versus sampling rate in Fig.E.13. Compare these results for the ERB scale with those for the Bark scale in Fig.E.4. The ERB map errors are plotted in Barks to facilitate comparison. The rms error of the conformal map fit to the ERB scale increases nearly linearly with log-sampling-rate. The ERB-scale error increases very smoothly with frequency while the Bark-scale error is non-monotonic (see Fig.E.4). The smoother behavior of the ERB errors appears due in part to the fact that the ERB scale is defined analytically while the Bark scale is defined more directly in terms of experimental data: The Bark-scale fit is so good as to be within experimental deviation, while the ERB-scale fit has a much larger systematic error component. The peak error in Fig.E.13 also grows close to linearly on a log-frequency scale and is similarly two to three times the Bark-scale errors of Fig.E.4.

Figure: ERB frequency mapping errors versus frequency for the sampling rate $ 31$ kHz.
\includegraphics[width=\twidth]{eps/fmeerb}

The frequency mapping errors are plotted versus frequency in Fig.E.14 for a sampling rate of $ 31$ kHz. Unlike the Bark-scale case in Fig.E.5, there is now a visible difference between the weighted equation-error and optimal least-squares mappings for the ERB scale. The figure shows also that the peak error when warping to an ERB scale is about three times larger than the peak error when warping to the Bark scale, growing from 0.64 Barks to 1.9 Barks. The locations of the peak errors are also at lower frequencies (moving from 1.3 and 8.8 kHz in the Bark-scale case to 0.7 and 8.2 kHz in the ERB-scale case).

ERB Relative Bandwidth Mapping Error

Figure: ERB RBME for $ f_s= 31$ kHz, with explicit minimization of RBME.
\includegraphics[width=\twidth]{eps/rbeerbslp}

The optimal relative bandwidth-mapping error (RBME) for the ERB case is plotted in Fig.E.15 for a $ 31$ kHz sampling rate. The peak error has grown from close to 20% for the Bark-scale case to more than 60% for the ERB case. Thus, frequency intervals are mapped to the ERB scale with up to three times as much relative error (60%) as when mapping to the Bark scale (20%). The continued narrowing of the auditory filter bandwidth as frequency decreases on the ERB scale results in the conformal map not being able to supply sufficient stretching of the low-frequency axis. The Bark scale case, on the other hand, is much better provided at low frequencies by the first-order conformal map.

Figure E.16: RMS and peak relative-bandwidth-mapping errors versus sampling rate for Chebyshev, least squares, weighted equation-error, and arctangent optimal maps, with explicit minimization of RBME used in all optimizations. The peak errors form a group lying well above the lower lying rms group.
\includegraphics[width=\twidth]{eps/pkrbmeerbslp}

Figure E.16 shows the rms and peak ERB RBME as a function of sampling rate. Near a 10 kHz sampling rate, for example, the Chebyshev ERB RBME is increased from 12% in the Bark-scale case to around 37%, again a tripling of the peak error. We can also see in Fig.E.16 that the arctangent formula gives a very good approximation to the optimal Chebyshev solution at all sampling rates. The optimal least-squares and weighted equation-error solutions are quite different, with the weighted equation-error solution moving from being close to the least-squares solution at low sampling rates, to being close to the Chebyshev solution at the higher sampling rates. The rms error is very similar in all four cases, as it was in the Bark-scale case, although the Chebyshev and arctangent formula solutions show noticeable increase in the rms error at low sampling rates where they also show a reduction in peak error by 5% or so.


Arctangent Approximations for $ \rho ^*(f_s)$ , ERB Case

For an approximation to the optimal Chebyshev ERB frequency mapping, the arctangent formula becomes

$\displaystyle \rho ^*_{\mathbf\gamma}(f_s) = 0.7446\left[{2\over\pi}\arctan(0.1418f_s)\right]^{{1\over2}}+0.03237,
$

where $ f_s$ is in kHz. This formula is plotted along with the various optimal $ \rho ^*$ curves in Fig.E.12a, and the approximation error is shown in Fig.E.12b. The performance of the arctangent approximation can be seen in Fig.E.13.

When the optimality criterion is chosen to minimize relative bandwidth mapping error in the ERB case, the arctangent formula optimization yields

$\displaystyle \rho ^*_{\mathbf\gamma}(f_s) = 0.7164\left[{2\over\pi}\arctan(0.09669f_s)\right]^{{1\over2}}+0.08667.
$

The performance of this formula is shown in Fig.E.16. It follows the optimal Chebyshev map parameter very well.


Directions for Improvements

Audio conformal maps can be adjusted by using a more general error weighting versus frequency. For example, the weighting can be set to zero above some frequency limit along the unit circle. A more general weighting can also be used to obtain improved accuracy in specific desired frequency ranges. Again, these refinements would seem to be of interest primarily for the ERB-scale and other mappings, since the Bark-scale warping is excellent already. The diagonal weighting matrix $ \hbox{\boldmath $V$}$ in the weighted equation error solution (E.3.1) can be multiplied by any desired application-dependent weighting.

As another variation, an auditory frequency scale could be defined based on the cochlear frequency-to-place function [96]. In this case, a close relationship still exists between equal-place increments along the basilar membrane and equal bandwidth increments in the defined audio filter bank. Preliminary comparisons [96, Fig. 9] indicate that the first-order conformal map errors for this case are qualitatively between the ERB and Bark-scale cases. The first-order conformal map works best when the auditory filter bandwidths level off to a minimum width at low frequencies, as they do in the Bark-scale case below $ 500$ Hz. Thus, the question of the ``audio fidelity'' of the first-order conformal map is directly tied to the question of what is really the best frequency resolution to provide at low frequencies in the auditory filter bank.


Summary

The first-order ``allpass'' conformal map which maps the unit circle to itself was configured to approximate frequency warpings from a linear frequency scale to either a Bark scale or an ERB frequency scale for a wide variety of sampling rates. The accuracy of this warping is extremely good for the Bark-scale case, and fair also for the ERB case; the first-order conformal map shows significantly more error in the ERB case (about three times that of the Bark-scale case) due to its narrower resolution bandwidths at low frequencies.

A closed-form expression was derived for the allpass coefficient which minimizes the norm of the weighted equation error between samples of the allpass warping and the desired Bark or ERB warpings. The weighting function was designed to give estimates as close as possible to the optimal least-squares estimate, and comparisons showed this to be well achieved, especially in the Bark-scale case.

A simple, closed-form, invertible expression which comes very close to the optimal Chebyshev allpass coefficient vs. sampling rate was given in (E.3.5) for the Bark-scale case and in (E.5.2) for the ERB-scale case.

Three optimal conformal maps were defined based on Chebyshev, least squares, and weighted equation-error approximation, and all three mappings were found to be psychoacoustically identical, for most practical purposes, in the Bark-scale case. When using optimal maps, the peak relative bandwidth mapping error is about $ 20$ % in the Bark-scale case and $ 60$ % in the ERB-scale case.

We conclude that the first-order conformal map is a highly useful tool for audio digital filter design and related applications in digital audio signal processing which may benefit from an order-invariant mapping of the unit circle from a linear frequency scale to an approximate auditory frequency scale. Matlab code for plots, optimizations, and the filter design example presented here are available online at
http://ccrma.stanford.edu/~jos/bbt/bbt.html


Next Section:
Examples in Matlab and Octave
Previous Section:
Gaussian Function Properties