Auditory Filter Banks
Auditory filter banks are non-uniform bandpass filter banks designed to imitate the frequency resolution of human hearing [307,180,87,208,255]. Classical auditory filter banks include constant-Q filter banks such as the widely used third-octave filter bank. Digital constant-Q filter banks have also been developed for audio applications [29,30]. More recently, constant-Q filter banks for audio have been devised based on the wavelet transform, including the auditory wavelet filter bank . Auditory filter banks have also been based more directly on psychoacoustic measurements, leading to approximations of the auditory filter frequency response in terms of a Gaussian function , a ``rounded exponential'' , and more recently the gammatone (or ``Patterson-Holdsworth'') filter bank [208,255]. The gamma-chirp filter bank further adds a level-dependent asymmetric correction to the basic gammatone channel frequency response, thus providing a more accurate approximation to the auditory frequency response [112,111].
The output power from an auditory filter bank at a particular time defines the so-called excitation pattern versus frequency at that time [87,179,305]. It may be considered analogous to the average power of the physical excitation applied to the hair cells of the inner ear by the vibrating basilar membrane in the cochlea.8.6 The shape of the excitation pattern can thus be thought of as approximating the envelope of the basilar membrane vibration.
The excitation pattern produced from an auditory filter bank, together with appropriate equalization (frequency-dependent gain) and nonlinear compression, can be used to define specific loudness as a function of time and frequency [306,305,177,182,88].
Because the channels of an auditory filter bank are distributed non-uniformly versus frequency, they can be regarded as a basis for a non-uniform sampling of the frequency axis. In this point of view, the auditory-filter frequency response becomes the (frequency-dependent) interpolation kernel used to extract a frequency sample at the filter's center frequency. See §7.3.3 below for further details.
Spectrogram of Speech