DSPRelated.com
Forums

Ear integration time and

Started by mot56k June 20, 2009
Good evening,

reading Julius O'Smith "Spectral audio signal processing" I couldn't
understand the following:

"FIR filters shorter than the ear's integration time can generally be
characterized by their magnitude frequency response (no perceivable "delay
effects"). The nominal "integration time" of the ear can be defined as the
reciprocal of a critical bandwidth of hearing. Using Zwicker's definition
of critical bandwidth [278], the smallest critical bandwidth of hearing is
approximately 100 Hz (below 500 Hz). Thus, the nominal integration time of
the ear is 10ms below 500 Hz. (Using the equivalent-rectangular-bandwidth
(ERB) definition of critical bandwidth, longer values are obtained). At a
50 kHz sampling rate, this is 500 samples. Therefore, FIR filters shorter
than the ear's "integration time", i.e., perceptually "instantaneous", can
easily be hundreds of taps long (as discussed in the next section). FFT
convolution is consequently an important implementation tool for FIR
filters in digital audio applications."
(http://ccrma-www.stanford.edu/~jos/sasp/Audio_FIR_Filters.html"

I've understood what the critical bandwidth is (please correct me if I'm
wrong). Depending on the frequency we are considering, there is a critical
bandwidth near it, that is an interval of frequencies where 2 simultanous
tones can't be hear as distinct. This critical band changes with the
frequency according to the rule : delta(f) = 0.3 * f^0.9

But now, why this has to do with time? What is the ear integration time,
is it the minimum time in which we can perceive a single "transient" or
(Gabor) a single "grain" of sound? There's something I'm missing here,
definitely. The following question is why is it 10 ms below 500 mhz, is
this based upon experimental results? 

Thanks in advance
Alessandro

On 20 Jun, 16:00, "mot56k" <alessandro.sacc...@gmail.com> wrote:

> But now, why this has to do with time?
There is a relation between bandwidth and time: The shorter duration of transients in time domain, the wider the bandwidth in frequency domain. And vice versa: Given a bandwidth in frequency domain, there is an associated time interval. The exact details might differ (when audio is concerned, the data might very well be empiric), but the general principle is known as the 'Time-Bandwidth Product'. Rune
mot56k <alessandro.saccoia@gmail.com> wrote:
 
> reading Julius O'Smith "Spectral audio signal processing" I couldn't > understand the following:
> "FIR filters shorter than the ear's integration time can generally be > characterized by their magnitude frequency response (no perceivable "delay > effects"). The nominal "integration time" of the ear can be defined as the > reciprocal of a critical bandwidth of hearing.
As I understand it, there are two paths to the signal our of the ear. One goes to the sound/music/pitch processing that you are considering. The other is for direction finding, which compares the arrival time of the signals from each ear in some complicated way. The integration time of the two is likely different. For low and mid frequencies, there is one nerve impulse at a constant point on the sine wave. For higher frequencies, they are still at a constant point, but every Nth cycle, where N may or may not be constant at a given frequency. -- glen
On Jun 20, 12:56&#4294967295;pm, glen herrmannsfeldt <g...@ugcs.caltech.edu>
wrote:
> mot56k <alessandro.sacc...@gmail.com> wrote: > > reading Julius O'Smith "Spectral audio signal processing" I couldn't > > understand the following: > > "FIR filters shorter than the ear's integration time can generally be > > characterized by their magnitude frequency response (no perceivable "delay > > effects"). The nominal "integration time" of the ear can be defined as the > > reciprocal of a critical bandwidth of hearing. > > As I understand it, there are two paths to the signal our of the ear. > > One goes to the sound/music/pitch processing that you are considering. > The other is for direction finding, which compares the arrival time > of the signals from each ear in some complicated way. &#4294967295;The integration > time of the two is likely different. > > For low and mid frequencies, there is one nerve impulse at > a constant point on the sine wave. &#4294967295;For higher frequencies, they > are still at a constant point, but every Nth cycle, where N > may or may not be constant at a given frequency. &#4294967295; > > -- glen
The simplest view of integration time is to conduct the following experiment. Creat a sound file with two impulses, and vary the spacing of the impulses. Above a given spacing, you will hear two distinct clicks. Below this spacing, you will hear a single click, but the tonal character of this single click will be altered and will change as the spacing varies. The point at which the impulses stop sounding like seperate events are start sounding like a single event happens at a spacing of about 20 to 40 ms. So this means that FIR filters that are shorter than this interval will change the perceived spectrum of the sound but not produce audible "echo" events. FIR filters that are longer than 20 ms have the potential of creating audible "tails" on impulsive sounds. It's a bit more complicated than this in that the ear has time-domain "masking" properties; an large impulse that is followed by a lower- amplitude ringing may not have perceptible time-domain effects, but if you reverse the signal such that the low-amplitude portion happens before the high-amplitude portion, you may be able to hear some effects. This is referred to as foward and backward temporal masking. As a practical matter this means that minimum-phase filters (which have their largest impulse early, followed by a decaying impulse response) will sound better (fewer audible time-domain effects) than their opposite (maximum-phase filters). This does point out the silliness of the audiophile focus on linear-phase filters, but that's for another thread. Now it gets a bit more complicated when you talk about critical bandwidths. Critical bandwidths are determined by playing a loud sine- wave at a particular frequency and then introducing smaller nearby frequencies, and varying their amplitudes until they become noticeable. You will find that the large signal "masks" the presence of nearby smaller signals. But if the interfering signals become far away in frequency, then the masking effect is diminished. Were it not for this effect, MP3 compression would not be possible :) However, these masking curves show that the ear does not have either constant-linear-frequency resolution OR constant-log-frequency resolution; rather, itis approximately constant-Q above 500 Hz or so, but below 500 Hz it is approximately constant-bandwidth. Now here is where I get a little confused, and maybe someone more knowledgable than myself can jump in. If you were to repeat the click experiment with band-limited clicks, filtered to different frequency ranges, you might expect that the fusion threshold for the high-pass- filtered clicks would be shorter than that for the low-pass-filtered clicks, due to the more narrow bandwidth of the ear in this range. But I don't think this is true. One way of explaining this is to model the ear as a bank of bandpass filters followed by full-wave-rectification and then an averaging filter. So the click fusion experiement depends not only of the filter bandwidth, but also on the time constant of the "detector". If the detector time constant is around 20ms then it would explain why the perceived fusion time is not strongly frequency- dependant (I do not know what the physical mecahnism of this "detector" is.) Bob Adams So going back to our clock experiment, if you were to band-limit the clicks to say, 5KHz to 20KHz, and again varied the spacing until the two seperate clicks merged into one, you would get a SHORTER
On Sat, 20 Jun 2009 16:56:50 +0000 (UTC), glen herrmannsfeldt
<gah@ugcs.caltech.edu> wrote:

>mot56k <alessandro.saccoia@gmail.com> wrote: > >> reading Julius O'Smith "Spectral audio signal processing" I couldn't >> understand the following: > >> "FIR filters shorter than the ear's integration time can generally be >> characterized by their magnitude frequency response (no perceivable "delay >> effects"). The nominal "integration time" of the ear can be defined as the >> reciprocal of a critical bandwidth of hearing. > >As I understand it, there are two paths to the signal our of the ear. > >One goes to the sound/music/pitch processing that you are considering. >The other is for direction finding, which compares the arrival time >of the signals from each ear in some complicated way.
Sounds reasonable so far.
> The integration >time of the two is likely different. > >For low and mid frequencies, there is one nerve impulse at >a constant point on the sine wave. For higher frequencies, they >are still at a constant point, but every Nth cycle, where N >may or may not be constant at a given frequency.
This is quite different from my (probably imperfect) understanding of how hearing works. AIUI, the inner sensor of the ear (the cochlea) is essentially a spectrum analyzer, with a long row of fine hairs (cilia) each of which is resonant at some frequency from 20 to 20,000 Hz nominally. The cilia have nerves at their bases which send periodic impulses to the brain at a rate nominally proportional to the vibration level of the associated cilia, with it's associated resonant frequency; a Fourier transform bin level of sorts. (There is also a sort of AGC acting here where the nerve firing rate depends on cilia vibration level history). The 10 ms time constant is for the response time of the nerves at the sensing cilia. There is no sampling of sound pressure levels, only measurement of the mechanical spectrum analyzer bin levels, and all post-processing including direction estimation and frequency estimation (bin interpolation) occurs after the 10 ms sensor time constant of the ear has occurred. A brief web search turned up no decent description of how the ear works, but also nothing contradicting my understanding, which was accumulated from various unremembered sources, although I do recall a discussion of the measurement of the ear's time constant being included in: Of Acoustics and Instruments - Memoirs of a Danish Pioneer; Per V. Br&#4294967295;el and Harry K. Zaveri Available at http://www.sandv.com/home.htm alas no direct link to the article, select Downloads - Back Issues Feb 08 part 1 and Aug 08 part 2, and select the article. The discussion of hearing is in part 2, both parts IMO interesting. Regards, Glen
Glen Walpert wrote:
> On Sat, 20 Jun 2009 16:56:50 +0000 (UTC), glen herrmannsfeldt > <gah@ugcs.caltech.edu> wrote: > >> mot56k <alessandro.saccoia@gmail.com> wrote: >> >>> reading Julius O'Smith "Spectral audio signal processing" I couldn't >>> understand the following: >>> "FIR filters shorter than the ear's integration time can generally be >>> characterized by their magnitude frequency response (no perceivable "delay >>> effects"). The nominal "integration time" of the ear can be defined as the >>> reciprocal of a critical bandwidth of hearing. >> As I understand it, there are two paths to the signal our of the ear. >> >> One goes to the sound/music/pitch processing that you are considering. >> The other is for direction finding, which compares the arrival time >> of the signals from each ear in some complicated way. > > Sounds reasonable so far. > >> The integration >> time of the two is likely different. >> >> For low and mid frequencies, there is one nerve impulse at >> a constant point on the sine wave. For higher frequencies, they >> are still at a constant point, but every Nth cycle, where N >> may or may not be constant at a given frequency. > > This is quite different from my (probably imperfect) understanding of > how hearing works. AIUI, the inner sensor of the ear (the cochlea) is > essentially a spectrum analyzer, with a long row of fine hairs (cilia) > each of which is resonant at some frequency from 20 to 20,000 Hz > nominally. The cilia have nerves at their bases which send periodic > impulses to the brain at a rate nominally proportional to the > vibration level of the associated cilia, with it's associated resonant > frequency; a Fourier transform bin level of sorts. (There is also a > sort of AGC acting here where the nerve firing rate depends on cilia > vibration level history). > > The 10 ms time constant is for the response time of the nerves at the > sensing cilia. There is no sampling of sound pressure levels, only > measurement of the mechanical spectrum analyzer bin levels, and all > post-processing including direction estimation and frequency > estimation (bin interpolation) occurs after the 10 ms sensor time > constant of the ear has occurred. > > A brief web search turned up no decent description of how the ear > works, but also nothing contradicting my understanding, which was > accumulated from various unremembered sources, although I do recall a > discussion of the measurement of the ear's time constant being > included in: > > Of Acoustics and Instruments - Memoirs of a Danish Pioneer; > Per V. Br&#4294967295;el and Harry K. Zaveri > > Available at http://www.sandv.com/home.htm > alas no direct link to the article, select Downloads - Back Issues > Feb 08 part 1 and Aug 08 part 2, and select the article. > > The discussion of hearing is in part 2, both parts IMO interesting.
The mechanics of the ear and the analytic processes in the brain require different explanations. I think you are essentially correct in how the ear acquires information. Glenn roughly described how that information is processed. Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Glen Walpert <nospam@null.void> wrote:
(snip, I wrote)

<>For low and mid frequencies, there is one nerve impulse at
<>a constant point on the sine wave.  For higher frequencies, they
<>are still at a constant point, but every Nth cycle, where N
<>may or may not be constant at a given frequency.  
 
< This is quite different from my (probably imperfect) understanding of
< how hearing works.  AIUI, the inner sensor of the ear (the cochlea) is
< essentially a spectrum analyzer, with a long row of fine hairs (cilia)
< each of which is resonant at some frequency from 20 to 20,000 Hz
< nominally.  The cilia have nerves at their bases which send periodic
< impulses to the brain at a rate nominally proportional to the
< vibration level of the associated cilia, with it's associated resonant
< frequency; a Fourier transform bin level of sorts.  (There is also a
< sort of AGC acting here where the nerve firing rate depends on cilia
< vibration level history).

The part I was trying to describe, told to me by someone who 
actually does the experimental work (on owls), is that the nerve 
impulses are synchronous with the input sine.  If they weren't, 
the phase difference for direction sensing would be very hard to do.  
For the higher frequency hair cells, even at maximum input, they
can't fire on each cycle.  Low frequency ones can.

Also, it is very difficult to make high-Q wet resonators.
My understanding, though in this case not directly from an actual
researcher, is that the effect is similar to that of ocean waves
on a beach.

The velocity of a water surface wave depends on the depth.
As a wave approaches a beach it slows down.  At some point, 
depending on the frequency, the back of the wave catches up
with the front (my description), resulting in what beach
visitors call breakers.  As I understand the cochlea, it has
a similar variable velocity structure, resulting in an amplitude
peak depending on frequency.  Even so, the Q is much lower than
one would like, which signal processing has to correct.  

< The 10 ms time constant is for the response time of the nerves at the
< sensing cilia.  There is no sampling of sound pressure levels, only
< measurement of the mechanical spectrum analyzer bin levels, and all
< post-processing including direction estimation and frequency
< estimation (bin interpolation) occurs after the 10 ms sensor time
< constant of the ear has occurred.

I am not sure I understand the 10ms yet, but that is okay.
It is my understanding that the nerve impulses are synchronous
to the input.  That is, completely unrelated to this discussion,
phase information is preserved.  I believe that phase is needed
for the direction sensing and is pretty much not used in the
musical processing part of the brain.  There used to be a
big discussion among audiophiles on absolute phase.
That was the reason I asked the researcher some years ago
about the connection between input sound and nerve impulses.

-- glen
 
On Mon, 22 Jun 2009 18:50:32 +0000 (UTC), glen herrmannsfeldt
<gah@ugcs.caltech.edu> wrote:

>Glen Walpert <nospam@null.void> wrote: >(snip, I wrote) > ><>For low and mid frequencies, there is one nerve impulse at ><>a constant point on the sine wave. For higher frequencies, they ><>are still at a constant point, but every Nth cycle, where N ><>may or may not be constant at a given frequency. > >< This is quite different from my (probably imperfect) understanding of >< how hearing works. AIUI, the inner sensor of the ear (the cochlea) is >< essentially a spectrum analyzer, with a long row of fine hairs (cilia) >< each of which is resonant at some frequency from 20 to 20,000 Hz >< nominally. The cilia have nerves at their bases which send periodic >< impulses to the brain at a rate nominally proportional to the >< vibration level of the associated cilia, with it's associated resonant >< frequency; a Fourier transform bin level of sorts. (There is also a >< sort of AGC acting here where the nerve firing rate depends on cilia >< vibration level history). > >The part I was trying to describe, told to me by someone who >actually does the experimental work (on owls), is that the nerve >impulses are synchronous with the input sine. If they weren't, >the phase difference for direction sensing would be very hard to do. >For the higher frequency hair cells, even at maximum input, they >can't fire on each cycle. Low frequency ones can. > >Also, it is very difficult to make high-Q wet resonators. >My understanding, though in this case not directly from an actual >researcher, is that the effect is similar to that of ocean waves >on a beach. > >The velocity of a water surface wave depends on the depth. >As a wave approaches a beach it slows down. At some point, >depending on the frequency, the back of the wave catches up >with the front (my description), resulting in what beach >visitors call breakers. As I understand the cochlea, it has >a similar variable velocity structure, resulting in an amplitude >peak depending on frequency. Even so, the Q is much lower than >one would like, which signal processing has to correct. > >< The 10 ms time constant is for the response time of the nerves at the >< sensing cilia. There is no sampling of sound pressure levels, only >< measurement of the mechanical spectrum analyzer bin levels, and all >< post-processing including direction estimation and frequency >< estimation (bin interpolation) occurs after the 10 ms sensor time >< constant of the ear has occurred. > >I am not sure I understand the 10ms yet, but that is okay. >It is my understanding that the nerve impulses are synchronous >to the input. That is, completely unrelated to this discussion, >phase information is preserved. I believe that phase is needed >for the direction sensing and is pretty much not used in the >musical processing part of the brain. There used to be a >big discussion among audiophiles on absolute phase. >That was the reason I asked the researcher some years ago >about the connection between input sound and nerve impulses.
OK, now I see what you meant. I think that there are probably significant differences between owl and human hearing, as owls are much better at localizing sound sources; human hearing probably perserves less phase information as the time constant is larger. Dr. Bruel's article I referenced claims time constants of 100 ms (at high frequencies) to 150 ms (at low frequencies) for human hearing, based on measured differences in actual versus percieved loudness of sound as duration is varied. Sounds much shorter than the time constant sound much less loud; at the time constant they appear to be at half loudness (-3 dB) and at 3 time constants there is no discernable difference in loudness (compared to apparent loudness of a continuous tone at the same actual loudness). Analogous to the step response of a first order low pass filter.
On Sat, 20 Jun 2009 17:58:35 -0700 (PDT), Robert Adams
<robert.adams@analog.com> wrote:
>As a practical matter this means that minimum-phase filters (which >have their largest impulse early, followed by a decaying impulse >response) will sound better (fewer audible time-domain effects) than >their opposite (maximum-phase filters). This does point out the >silliness of the audiophile focus on linear-phase filters, but that's >for another thread.
Robert, could you please elaborate on this a bit more? Regards, Goran Tomas
On Jun 23, 3:44&#4294967295;am, Goran Tomas <goran.tomasN...@Memail.htnet.hr>
wrote:
> On Sat, 20 Jun 2009 17:58:35 -0700 (PDT), Robert Adams > > <robert.ad...@analog.com> wrote: > >As a practical matter this means that minimum-phase filters (which > >have their largest impulse early, followed by a decaying impulse > >response) will sound better (fewer audible time-domain effects) &#4294967295;than > >their opposite (maximum-phase filters). This does point out the > >silliness of the audiophile focus on linear-phase filters, but that's > >for another thread. > > Robert, could you please elaborate on this a bit more? >
also, i thought the audiophile preference for linear phase filters was so that onsets of impulsive sounds would be time-aligned and stereo (or multichannel) perceived localization of sound would be predictable. i know that minimum-phase filters will likely not have pre-echoes and linear-phase filters *could* (but would not have to) have pre-echoes (which are bad for audiophiles), but i can see why, in multichannel audio chains, linear phase may appear preferable to minimum phase. r b-j