# Bilinear Frequency-Warping for Audio Spectrum Analysis over Bark and ERB Frequency Scales

With the increasing use of frequency-domain techniques in audio signal processing applications such as audio compression, there is increasing emphasis on psychoacoustic-based spectral measures [274,17,113,118]. In particular,*frequency warping*is an important tool in spectral audio signal processing. For example,

*audio spectrograms*(Chapter 7) can display signal energy versus time over a more perceptual, nonuniform, audio frequency axis (§7.3). Also, methods for

*digital filter design*(Chapter 4) having no weighting function versus frequency, such as linear predictive coding (LPC) (§10.3), can be given an effective weighting function by means of frequency warping [278].

A common choice of audio frequency warping in audio applications is from a linear frequency scale to a

*Bark frequency scale*(also called ``critical band rate'') [306,307,304,179,102,269]. The Bark scale is defined so that critical bands of hearing are uniformly spaced. (One critical bandwidth equals one Bark.) A more recently developed psychoacoustic frequency scale, called the Equivalent Rectangular Bandwidth (ERB) scale [88], is based on different psychoacoustic experiments resulting in generally narrower critical bandwidth estimates. This appendix, condensed from [269,268], describes a useful class approximate Bark/ERB frequency warpings that may be implemented using a

*bilinear transform*(first-order conformal map of the unit circle to itself in the plane). Such warpings

*preserve order*in filter-design applications. That is, the warping can be undone by the inverse bilinear transform which, because its first order, does not change the order of the filter that was designed over the warped frequency axis.

## The Bark Frequency Scale

Based on the results of many psychoacoustic experiments, the*Bark scale*is defined so that the critical bands of human hearing each have a width of one Bark. By representing spectral energy (in dB) over the Bark scale, a closer correspondence is obtained with spectral information processing in the ear (§7.3). The Bark scale ranges from 1 to 24 Barks, corresponding to the first 24 critical bands of hearing [304]. The published Bark band edges are given in Hertz as [0, 100, 200, 300, 400, 510, 630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700, 3150, 3700, 4400, 5300, 6400, 7700, 9500, 12000, 15500]. The published band centers in Hertz are [50, 150, 250, 350, 450, 570, 700, 840, 1000, 1170, 1370, 1600, 1850, 2150, 2500, 2900, 3400, 4000, 4800, 5800, 7000, 8500, 10500, 13500]. These center-frequencies and bandwidths are to be interpreted as samplings of a continuous variation in the frequency response of the ear to a sinusoid or narrow-band noise process. That is, critical-band-shaped masking patterns should be seen as forming around specific stimuli in the ear rather than being associated with a specific fixed filter bank in the ear. Note that since the Bark scale is defined only up to 15.5 kHz, the highest sampling rate for which the Bark scale is defined up to the Nyquist limit, without requiring extrapolation, is 31 kHz. The 25th Bark band certainly extends above 19 kHz (the sum of the 24th Bark band edge and the 23rd critical bandwidth), so that a sampling rate of 40 kHz is implicitly supported by the data. We have extrapolated the Bark band-edges in our work, appending the values [20500, 27000] so that sampling rates up to 54 kHz are defined. While human hearing generally does not extend above 20 kHz, audio sampling rates as high as 48 kHz or higher are common in practice. The Bark scale is defined above in terms of frequency in Hz versus Bark number. For computing optimal bilinear transformations, it is preferable to optimize the fit to the

*inverse*of this map,

*i.e.*, Barks versus Hz, so that the mapping error will be measured in Barks rather than Hz.

## The Bilinear Transform

The formula for a general first-order (bilinear) conformal mapping of functions of a complex variable is conveniently expressed by [42, page 75] It can be seen that choosing three specific points and their images determines the mapping for all and . Bilinear transformations map circles and lines into circles and lines (lines being viewed as circles passing through the point at infinity). In digital audio, where both domains are `` planes,'' we normally want to map the unit circle to itself, with dc mapping to dc ( ) and half the sampling rate mapping to half the sampling rate ( ). Making these substitutions in (E.2) leaves us with transformations of the form(E.1) |

The constant provides one remaining degree of freedom which can be used to map any particular frequency (corresponding to the point on the unit circle) to a new location . All other frequencies will be

*warped*accordingly. Note that this class of ``circle to circle'' bilinear transformations takes the form of the transfer function of an

*allpass filter*. We therefore call it an ``allpass transformation''. The ``allpass coefficient'' can be written in terms of the frequencies and as

(E.2) |

In this form, it is clear that is real, and that the inverse of is . Also, since , and for an audio warping (where low frequencies must be ``stretched out'' relative to high frequencies), we have for audio-type mappings from the plane to the plane.

## Optimal Bilinear Bark Warping

It turns out that a first-order conformal map (bilinear transform) can provide a surprisingly close match to the Bark frequency scale [268,269]. This is shown in Fig.E.1.### Computing

Our goal is to find the allpass coefficient such that the frequency mapping**angle**

*equation error*is introduced, as is common practice in developing solutions to nonlinear system identification problems [152]. Consider mapping the frequency via the allpass transformation , Now, multiply (E.3.1) by the denominator , and substitute from (E.3.1), to get

*k*th diagonal element , then the weighted least-squares solution (E.3.1) reduces to

The

*k*th diagonal element of an optimal diagonal weighting matrix is given by [269] Note that the desired weighting depends on the unknown map parameter . To overcome this difficulty, we suggest first estimating using , where denotes the identity matrix, and then computing using the weighting (E.3.1) based on the unweighted solution. This is analogous to the

*Steiglitz-McBride algorithm*for converting an equation-error minimizer to the more desired ``output-error'' minimizer using an iteratively computed weight function [151].

### Optimal Frequency Warpings

In [269], optimal allpass coefficients were computed for sampling rates of twice the Bark band-edge frequencies by means of four different optimization methods:- Minimize the peak arc-length error at each sampling rate to obtain the optimal Chebyshev allpass parameter .
- Minimize the sum of squared arc-length errors to obtain the optimal least-squares allpass parameter .
- Use the closed-form weighted equation-error solution (E.3.1) computed twice, first with , and second with set from (E.3.1) to obtain the optimal ``weighted equation error'' solution .
- Fit the function to the optimal Chebyshev allpass parameter via Chebyshev optimization with respect to . We will refer to the resulting function as the ``arctangent approximation'' (or, less formally, the ``Barktan formula''), and note that it is easily computed directly from the sampling rate.

^{E.1}are plotted for all four cases (Chebyshev, least squares, weighted equation-error, and arctangent approximation). The conformal-map fit to the Bark scale is generally excellent in all cases. We see that the rms error is essentially identical in the first three cases, although the Chebyshev rms error is visibly larger below 10 kHz. Similarly, the peak error is essentially the same for least squares and weighted equation error, with the Chebyshev case being able to shave almost 0.1 Bark from the maximum error at high sampling rates. The arctangent formula shows up to a tenth of a Bark larger peak error at sampling rates 15-30 and 54 kHz, but otherwise it performs very well; at 41 kHz and below 12 kHz the arctangent approximation is essentially optimal in all senses considered. At sampling rates up to the maximum non-extrapolated sampling rate of kHz, the peak mapping errors are all much less than one Bark (0.64 Barks for the Chebyshev case and 0.67 Barks for the two least squares cases). The mapping errors in Barks can be seen to increase almost linearly with sampling rate. However, the irregular nature of the Bark-scale data results in a nonmonotonic relationship at lower sampling rates. The specific frequency mapping errors versus frequency at the kHz sampling rate (the same case shown in Fig.E.1) are plotted in Fig.E.5. Again, all four cases are overlaid, and again the least squares and weighted equation-error cases are essentially identical. By forcing equal and opposite peak errors, the Chebyshev case is able to lower the peak error from 0.67 to 0.64 Barks. A difference of 0.03 Barks is probably insignificant for most applications. The peak errors occur at 1.3 kHz and 8.8 kHz where the error is approximately 2/3 Bark. The arctangent formula peak error is 0.73 Barks at 8.8 kHz, but in return, its secondary error peak at 1.3 kHz is only 0.55 Barks. In some applications, such as when working with oversampled signals, higher accuracy at low frequencies at the expense of higher error at very high frequencies may be considered a desirable tradeoff. We see that the mapping falls ``behind'' a bit as frequency increases from zero to 1.3 kHz, mapping linear frequencies slightly below the desired corresponding Bark values; then, the mapping ``catches up,'' reaching an error of 0 Barks near 3 kHz. Above 3 kHz, it gets ``ahead'' slightly, with frequencies in Hz being mapped a little too high, reaching the positive error peak at 8.8 kHz, after which it falls back down to zero error at . (Recall that dc and half the sampling-rate are always points of zero error by construction.)

### Bark Relative Bandwidth Mapping Error

The*slope*of the frequency versus warped-frequency curve can be interpreted as being proportional to critical bandwidth, since a unit interval (one Bark) on the warped-frequency axis is magnified by the slope to restore the band to its original size (one critical bandwidth). It is therefore interesting to look at the

*relative slope error*,

*i.e.*, the error in the slope of the frequency mapping divided by the ideal Bark-map slope. We interpret this error measure as the

*relative bandwidth-mapping error*(RBME). The RBME is plotted in Fig.E.6 for a kHz sampling rate. The worst case is 21% for the Chebyshev case and 20% for both least-squares cases. When the mapping coefficient is explicitly optimized to minimize RBME, the results of Fig.E.7 are obtained: the Chebyshev peak error drops from 21% down to 18%, while the least-squares cases remain unchanged at 20% maximum RBME. A 3% change in RBME is comparable to the 0.03 Bark peak-error reduction seen in Fig.E.5 when using the Chebyshev norm instead of the norm; again, such a small difference is not likely to be significant in most applications.

### Error Significance

In one study, young normal listeners exhibited a standard deviation in their measured auditory bandwidths (based on notched-noise masking experiments) on the order of 10% of center frequency [178]. Therefore, a 20% peak error in mapped bandwidth (typical for sampling rates approaching 40 kHz) could be considered significant. However, the*range*of auditory-filter bandwidths measured in 93 young normal subjects at 2 kHz [178] was 230 to 410 Hz, which is -26% to +32% relative to 310 Hz. In [298], 40 subjects were measured, yielding auditory-filter bandwidths between -33% and +65%, with a standard deviation of 18%. It may thus be concluded that a worst-case mapping error on the order of 20%, while probably detectable by ``golden ears'' listeners, lies well within the range of experimental deviations in the empirical measurement of auditory bandwidth. As a worst-case example of how the 18% peak bandwidth-mapping error in Fig.E.7 might correspond to an audible distortion, consider one critical band of noise centered at the frequency of maximum negative mapping error, scaled to be the same loudness as a single critical band of noise centered at the frequency of maximum positive error. The systematic nature of the mapping error results in a narrowing of the lower band and expansion of the upper band by about 1.7 dB. As a result, over the warped frequency axis, the upper band will be effectively

*emphasized*over the lower band by about 3 dB.

### Arctangent Approximations for

This subsection provides further details on the arctangent approximation for the optimal allpass coefficient as a function of sampling rate. Compared with other spline or polynomial approximations, the arctangent form was found to provide a more parsimonious expression at a given accuracy level. The idea was that the arctangent function provided a mapping from the interval , the domain of , to the interval , the range of . The additive component allowed to be zero at smaller sampling rates, where the Bark scale is linear with frequency. As an additional benefit, the arctangent expression was easily inverted to give sampling rate in terms of the allpass coefficient :*slope*error), the arctangent formula optimization yields The performance of this formula is shown in Fig.E.8. It tends to follow the performance of the optimal least squares map parameter even though the peak parameter error was minimized relative to the optimal Chebyshev map. At 54 kHz there is an additional 3% bandwidth error due to the arctangent approximation, and near 10 kHz the additional error is about 4%; at other sampling rates, the performance of the RBME arctangent approximation is better, and like (E.3.5), it is extremely accurate at 41 kHz.

## Application to Audio Filter Design

Frequency warping is generally employed in audio filter design by- warping the desired frequency response, thus ``horizontally stretching'' the more important low-frequency region of the spectrum.
- performing a filter design over the warped frequency axis, and
- transforming the resulting filter to eliminate the frequency warp, returning it to the normal frequency axis.

*i.e.*, substituting some rational-function-of- for in the filter transfer function). Since bilinear-transform frequency-mappings are first order, when the resulting filter transformed back to unwarped form, its order remains the same [258].

### Filter Design Example

^{E.2}

- The optimal allpass coefficient was found using (E.3.5).
- The desired frequency response defined on a linear frequency axis was warped to an approximate Bark scale using the Bark bilinear transform, .
- A parametric ARMA model was fit to the desired Bark-warped frequency response over the unit circle .
- Finally, the inverse Bark bilinear transform was used to ``unwarp'' the modeled system to a linear frequency axis.

*numerical conditioning*of the filter design problem; this applies also to optimization under the Hankel norm which includes an optimal Chebyshev design internally as an intermediate step. Further filter-design examples, including more on the Hankel-norm case, may be found in [258].

## Equivalent Rectangular Bandwidth

It also turns out that a first-order conformal map (bilinear transform) can provide a good match to the ERB scale [269] as well. Moore and Glasberg [177] have revised Zwicker's loudness model to better explain (1) how equal-loudness contours change as a function of level, (2) why loudness remains constant as the bandwidth of a fixed-intensity sound increases up to the critical bandwidth, and (3) the loudness of partially masked sounds. The modification that is relevant here is the replacement of the Bark scale by the*equivalent rectangular bandwidth*(ERB) scale. The ERB of the auditory filter is assumed to be closely related to the critical bandwidth, but it is measured using the

*notched-noise*method [205,206,251,181,87] rather than on classical masking experiments involving a narrow-band masker and probe tone [306,307,304]. As a result, the ERB is said not to be affected by the detection of beats or intermodulation products between the signal and masker. Since this scale is defined analytically, it is also more smoothly behaved than the Bark scale data.

*place*along the basilar membrane [96, p. 2601].

*ERB scale*is defined as the number of ERBs below each frequency for in Hz [177]. An overlay of the normalized Bark and ERB frequency warpings is shown in Fig.E.11. The ERB warping is determined by scaling the inverse of (E.5), evaluated along a uniform frequency grid from zero to the number of ERBs at half the sampling rate, so that dc maps to zero and half the sampling rate maps to . Proceeding in the same manner as for the Bark-scale case, allpass coefficients giving a best approximation to the ERB-scale warping were computed for sampling rates near twice the Bark band edge frequencies (chosen to facilitate comparison between the ERB and Bark cases). The resulting optimal map coefficients are shown in Fig.E.12. The allpass parameter increases with increasing sampling rate, as in the Bark-scale case, but it covers a significantly narrower range, as a comparison with Fig.E.3 shows. Also, the Chebyshev solution is now systematically larger than the least-squares solutions, and the least-squares and weighted equation-error cases are no longer essentially identical. The fact that the arctangent formula is optimized for the Chebyshev case is much more evident in the error plot of Fig.E.12b than it was in Fig.E.3b for the Bark warping parameter.

### ERB Relative Bandwidth Mapping Error

The optimal relative bandwidth-mapping error (RBME) for the ERB case is plotted in Fig.E.15 for a kHz sampling rate. The peak error has grown from close to 20% for the Bark-scale case to more than 60% for the ERB case. Thus, frequency intervals are mapped to the ERB scale with up to three times as much relative error (60%) as when mapping to the Bark scale (20%). The continued narrowing of the auditory filter bandwidth as frequency decreases on the ERB scale results in the conformal map not being able to supply sufficient stretching of the low-frequency axis. The Bark scale case, on the other hand, is much better provided at low frequencies by the first-order conformal map.### Arctangent Approximations for , ERB Case

For an approximation to the optimal Chebyshev ERB frequency mapping, the arctangent formula becomes where is in kHz. This formula is plotted along with the various optimal curves in Fig.E.12a, and the approximation error is shown in Fig.E.12b. The performance of the arctangent approximation can be seen in Fig.E.13. When the optimality criterion is chosen to minimize relative bandwidth mapping error in the ERB case, the arctangent formula optimization yields The performance of this formula is shown in Fig.E.16. It follows the optimal Chebyshev map parameter very well.## Directions for Improvements

Audio conformal maps can be adjusted by using a more general error weighting versus frequency. For example, the weighting can be set to zero above some frequency limit along the unit circle. A more general weighting can also be used to obtain improved accuracy in specific desired frequency ranges. Again, these refinements would seem to be of interest primarily for the ERB-scale and other mappings, since the Bark-scale warping is excellent already. The diagonal weighting matrix in the weighted equation error solution (E.3.1) can be multiplied by any desired application-dependent weighting. As another variation, an auditory frequency scale could be defined based on the cochlear frequency-to-place function [96]. In this case, a close relationship still exists between equal-place increments along the basilar membrane and equal bandwidth increments in the defined audio filter bank. Preliminary comparisons [96, Fig. 9] indicate that the first-order conformal map errors for this case are qualitatively between the ERB and Bark-scale cases. The first-order conformal map works best when the auditory filter bandwidths level off to a minimum width at low frequencies, as they do in the Bark-scale case below Hz. Thus, the question of the ``audio fidelity'' of the first-order conformal map is directly tied to the question of what is really the best frequency resolution to provide at low frequencies in the auditory filter bank.## Summary

The first-order ``allpass'' conformal map which maps the unit circle to itself was configured to approximate frequency warpings from a linear frequency scale to either a Bark scale or an ERB frequency scale for a wide variety of sampling rates. The accuracy of this warping is extremely good for the Bark-scale case, and fair also for the ERB case; the first-order conformal map shows significantly more error in the ERB case (about three times that of the Bark-scale case) due to its narrower resolution bandwidths at low frequencies. A closed-form expression was derived for the allpass coefficient which minimizes the norm of the weighted equation error between samples of the allpass warping and the desired Bark or ERB warpings. The weighting function was designed to give estimates as close as possible to the optimal least-squares estimate, and comparisons showed this to be well achieved, especially in the Bark-scale case. A simple, closed-form, invertible expression which comes very close to the optimal Chebyshev allpass coefficient vs. sampling rate was given in (E.3.5) for the Bark-scale case and in (E.5.2) for the ERB-scale case. Three optimal conformal maps were defined based on Chebyshev, least squares, and weighted equation-error approximation, and all three mappings were found to be psychoacoustically identical, for most practical purposes, in the Bark-scale case. When using optimal maps, the peak relative bandwidth mapping error is about % in the Bark-scale case and % in the ERB-scale case. We conclude that the first-order conformal map is a highly useful tool for audio digital filter design and related applications in digital audio signal processing which may benefit from an order-invariant mapping of the unit circle from a linear frequency scale to an approximate auditory frequency scale. Matlab code for plots, optimizations, and the filter design example presented here are available online at`http://ccrma.stanford.edu/~jos/bbt/bbt.html`

**Next Section:**

Examples in Matlab and Octave

**Previous Section:**

Gaussian Function Properties