comp.dsp | HRTF approximation with FIR or IIR filters

Has anyone here worked on HRTF-related projects?  I'm interested in
very fast, compact algorithms that cover the bases without getting too
CPU-heavy.  That would rule out impulse libs and such.

I suspect that quite a bit of the effect could be accomplished by
simple low-order IIR or FIR filters--perhaps mapping 6 to 8 at uniform
angles for each ear (every 60 or .45 degrees, resp).

I presume that something like this has been done, but I've never heard
anything about how effective the results are, compared to the standard
impulse lib approach.

Reply by Greg Berchin ●November 11, 20162016-11-11

On Fri, 11 Nov 2016 05:15:48 -0500, Max <Max@sorrynope.com> wrote:

>Has anyone here worked on HRTF-related projects?  I'm interested in
>very fast, compact algorithms that cover the bases without getting too
>CPU-heavy.  That would rule out impulse libs and such.
>
>I suspect that quite a bit of the effect could be accomplished by
>simple low-order IIR or FIR filters--perhaps mapping 6 to 8 at uniform
>angles for each ear (every 60 or .45 degrees, resp).

I implemented the complete CIPIC array
    http://interface.cipic.ucdavis.edu/sound/hrtf.html
    interface.cipic.ucdavis.edu/pubs/WASSAP_2001_143.pdf
which places a different HRTF approximately every 5&#4294967295; to 6&#4294967295; in azimuth
and elevation. Many people could find at least one of the CIPIC measured
HRTFs that worked for them. But for a significant number of people,
*none* of them worked. 

>I presume that something like this has been done, but I've never heard
>anything about how effective the results are, compared to the standard
>impulse lib approach.

Well, if the standard impulse lib approach, spaced every 5&#4294967295;, doesn't
work very well, then a coarse array spaced 45&#4294967295; to 60&#4294967295; probably won't
work any better.

Greg

Reply by Max ●November 11, 20162016-11-11

On Fri, 11 Nov 2016 06:49:55 -0600, Greg Berchin
<gjberchin@chatter.net.invalid> wrote:

>On Fri, 11 Nov 2016 05:15:48 -0500, Max <Max@sorrynope.com> wrote:
>
>>Has anyone here worked on HRTF-related projects?  I'm interested in
>>very fast, compact algorithms that cover the bases without getting too
>>CPU-heavy.  That would rule out impulse libs and such.

>I implemented the complete CIPIC array
>    http://interface.cipic.ucdavis.edu/sound/hrtf.html
>    interface.cipic.ucdavis.edu/pubs/WASSAP_2001_143.pdf
>which places a different HRTF approximately every 5&#4294967295; to 6&#4294967295; in azimuth
>and elevation. Many people could find at least one of the CIPIC measured
>HRTFs that worked for them. But for a significant number of people,
>*none* of them worked. 

Well that's not encouraging. My understanding, though, is that azimuth
is more likely to work, but elevation is hit or miss. I'm going for
azimuth only, and just trying to supplement a room modeling algorithm
for better headphone response.  IOW, to hopefully change the focal
point to register a bit more like speakers.

>>I presume that something like this has been done, but I've never heard
>>anything about how effective the results are, compared to the standard
>>impulse lib approach.
>
>Well, if the standard impulse lib approach, spaced every 5&#4294967295;, doesn't
>work very well, then a coarse array spaced 45&#4294967295; to 60&#4294967295; probably won't
>work any better.
>
>Greg

 Yeah, I guess not!  It sounds like you've spent a lot of time
testing. Did you have significant problems with azimuth?

Cool to hear that you've done that, Greg.  How did you do the
impulse-to-filter conversion?  Is it online anywhere?

Reply by Greg Berchin ●November 11, 20162016-11-11

On Fri, 11 Nov 2016 08:56:17 -0500, Max <Max@sorrynope.com> wrote:

>Well that's not encouraging. My understanding, though, is that azimuth
>is more likely to work, but elevation is hit or miss. 

Human hearing is much more sensitive to variations in azimuth that in
elevation. But that is common knowledge, I think.

> Yeah, I guess not!  It sounds like you've spent a lot of time
>testing. Did you have significant problems with azimuth?

I'm not certain what you mean by "significant" problems. For some the
biggest problem was front-to-back reversal -- that's pretty significant.

>Cool to hear that you've done that, Greg.  How did you do the
>impulse-to-filter conversion?  Is it online anywhere?

It's all proprietary, so I am not at liberty to discuss it. Sorry.

Greg

Reply by Max ●November 11, 20162016-11-11

On Fri, 11 Nov 2016 15:37:18 -0600, Greg Berchin
<gjberchin@chatter.net.invalid> wrote:

>On Fri, 11 Nov 2016 08:56:17 -0500, Max <Max@sorrynope.com> wrote:
>
>>Well that's not encouraging. My understanding, though, is that azimuth
>>is more likely to work, but elevation is hit or miss. 
>
>Human hearing is much more sensitive to variations in azimuth that in
>elevation. But that is common knowledge, I think.

That's why I wasn't considering elevation.  The objective is simply to
deliver sounds via headphones that sound like in-room speakers, so
elevation shouldn't play a direct part.

Given that most azimuth info is provided by ITD and ILD, I was looking
simply to incorporate left-right 'head shadow' effects rather than
trying to model pinnae reflections.  Pinnae modeling would probably
require lots of CPU, and a difficult level of user interface (how
would you choose the correct pinnae model, etc).

As you point out, it may not work anyway.  So I thought that most of
the functionality could be achieved by doing relatively precise
calculations for ITD, then relegating ILD and filtering for head-
shadow to the 60 degree or 45 degree sector map that I referred to.
Does that make more sense?

That may be all the CPU cycles that I can afford anyway. The
reflection algs will be expensive.

>> Yeah, I guess not!  It sounds like you've spent a lot of time
>>testing. Did you have significant problems with azimuth?
>
>I'm not certain what you mean by "significant" problems. For some the
>biggest problem was front-to-back reversal -- that's pretty significant.

Oh, that would indeed be significant.  I was hoping that the only
miscue problems were with elevation.  If that problem was not caused
by some odd side-effect of the in-depth pinnae filtering, then it may
occur even with the roughly modeled head-shadow approach.
Did you get any feel for what could have caused it?

I've heard some speculation that some of the front-back problem could
be due to one of those subtle low-level brain connections between the
visual and audio systems: What the ears get audio info intended for
direct front, but the eyes can't locate the source, the brain puts in
the suggestion that the sound is from the back.  I believe I first saw
that mentioned on the site that you linked to.

>>Cool to hear that you've done that, Greg.  How did you do the
>>impulse-to-filter conversion?  Is it online anywhere?
>
>It's all proprietary, so I am not at liberty to discuss it. Sorry.
>
>Greg

Too bad--I would have loved to see how you approached that.  Are you
at liberty to say what platform you were running on, or how CPU-
intensive the algorithm was?  I would expect considerable savings over
the direct impulse convolution approach.

There are some AES papers that touch on conversion to IIR/FIR, but I
haven't found anything directly relevant yet.  I was hoping for some
simple averaged filters that would just get head-shadow volume and
frequency effects in range, rather than trying complex elevation/pinna
mapping.  Again, hoping that synth'd room reflections would provide
the needed spatial cues.

I've also heard that reflections from side walls are very important in
determining sense of depth (and perhaps relate to front-back).  DId
you consider trying that?  Even low-level lateral cues may help.

This is an interesting subject--way deeper than I suspected when I
first started researching it.  That humans use reflections off
shoulders for spatial judgement?  That's a complex evolutionary step!

Reply by Greg Berchin ●November 11, 20162016-11-11

On Friday, November 11, 2016 at 7:09:02 PM UTC-6, Max wrote:

> Given that most azimuth info is provided by ITD and ILD, I was looking
> simply to incorporate left-right 'head shadow' effects rather than
> trying to model pinnae reflections.  Pinnae modeling would probably
> require lots of CPU, and a difficult level of user interface (how
> would you choose the correct pinnae model, etc).

The CIPIC database sidesteps the modeling problem by simply measuring the H=
RIRs of a whole bunch of people. You might get some typical values for ITD =
and ILD from analysis of the CIPIC data.

> As you point out, it may not work anyway.  So I thought that most of
> the functionality could be achieved by doing relatively precise
> calculations for ITD, then relegating ILD and filtering for head-
> shadow to the 60 degree or 45 degree sector map that I referred to.
> Does that make more sense?

I wish that it was that simple. We found that HRTFs/HRIRs are very specific=
 to individuals. We had significant success at making them more generic, bu=
t it took a lot of proprietary "secret sauce" to do it.

> >I'm not certain what you mean by "significant" problems. For some the
> >biggest problem was front-to-back reversal -- that's pretty significant.
>=20
> Oh, that would indeed be significant.  I was hoping that the only
> miscue problems were with elevation.  If that problem was not caused
> by some odd side-effect of the in-depth pinnae filtering, then it may
> occur even with the roughly modeled head-shadow approach.
> Did you get any feel for what could have caused it?

No. For one of the subjects who had persistent front/rear ambiguity problem=
s (that would be me), imaging in front of the head was extraordinarily diff=
icult to achieve, even with true binaural recordings.

> I've heard some speculation that some of the front-back problem could
> be due to one of those subtle low-level brain connections between the
> visual and audio systems: What the ears get audio info intended for
> direct front, but the eyes can't locate the source, the brain puts in
> the suggestion that the sound is from the back. =20

There may be some truth to that. I recall that when I auditioned the Smyth =
Realiser, I could achieve at least some "externalization" as long as my eye=
s were open. But the moment I closed my eyes, all externalization collapsed=
 back to typical inside-the-head localization.

> Too bad--I would have loved to see how you approached that.  Are you
> at liberty to say what platform you were running on, or how CPU-
> intensive the algorithm was?  I would expect considerable savings over
> the direct impulse convolution approach.

Sorry; I cannot say anything about the implementation.

> I've also heard that reflections from side walls are very important in
> determining sense of depth (and perhaps relate to front-back).  DId
> you consider trying that?  Even low-level lateral cues may help.

As above.

> This is an interesting subject--way deeper than I suspected when I
> first started researching it.  That humans use reflections off
> shoulders for spatial judgement?  That's a complex evolutionary step!

It is said that some blind people can echolocate. I find that fascinating.

Greg

Reply by Max ●November 12, 20162016-11-12

On Fri, 11 Nov 2016 19:05:17 -0800 (PST), Greg Berchin
<gjberchin@charter.net> wrote:

>On Friday, November 11, 2016 at 7:09:02 PM UTC-6, Max wrote:
>
>> Given that most azimuth info is provided by ITD and ILD, I was looking
>> simply to incorporate left-right 'head shadow' effects rather than
>> trying to model pinnae reflections.  Pinnae modeling would probably
>> require lots of CPU, and a difficult level of user interface (how
>> would you choose the correct pinnae model, etc).
>
>The CIPIC database sidesteps the modeling problem by simply measuring the HRIRs of a whole bunch of people. You might get some typical values for ITD and ILD from analysis of the CIPIC data.

Hi Greg,

I found some HRTF data at MIT, IRCAM, and of course the CIPIC
database.  But I'm not sure that I'll have enough CPU time to make use
of any impulse libraries.  Or were you suggesting something else?

>> As you point out, it may not work anyway.  So I thought that most of
>> the functionality could be achieved by doing relatively precise
>> calculations for ITD, then relegating ILD and filtering for head-
>> shadow to the 60 degree or 45 degree sector map that I referred to.
>> Does that make more sense?
>
>I wish that it was that simple. We found that HRTFs/HRIRs are very specific to individuals. We had significant success at making them more generic, but it took a lot of proprietary "secret sauce" to do it.

I was mostly curious about generating 'out-of-head' headphone signals,
rather than fine-tuning exact spatial positioning.  But I have just
started to research it.  I was figuring that the room reflections and
roughly computed head-shadow/ITD/IL model would get most of the way
there.  Are you saying that individualized pinna reflections are
absolutely essential to that? 

>> >I'm not certain what you mean by "significant" problems. For some the
>> >biggest problem was front-to-back reversal -- that's pretty significant.
>> 
>> Oh, that would indeed be significant.  I was hoping that the only
>> miscue problems were with elevation.  If that problem was not caused
>> by some odd side-effect of the in-depth pinnae filtering, then it may
>> occur even with the roughly modeled head-shadow approach.
>> Did you get any feel for what could have caused it?
>
>No. For one of the subjects who had persistent front/rear ambiguity problems (that would be me), imaging in front of the head was extraordinarily difficult to achieve, even with true binaural recordings.

You're lucky to have a good test subject so close at hand. :-)

Perhaps the solution to the individualized pinna problem is to ship a
pair of standardized rubber ears with the software, with instructions
to wear them for three weeks before running the programs. :-)  (I have
heard that the brain will learn the new impulse response in about that
time)

>> I've heard some speculation that some of the front-back problem could
>> be due to one of those subtle low-level brain connections between the
>> visual and audio systems: What the ears get audio info intended for
>> direct front, but the eyes can't locate the source, the brain puts in
>> the suggestion that the sound is from the back.  
>
>There may be some truth to that. I recall that when I auditioned the Smyth Realiser, I could achieve at least some "externalization" as long as my eyes were open. But the moment I closed my eyes, all externalization collapsed back to typical inside-the-head localization.

That's odd.  It's kind of backwards from the speculation in the CIPIC
papers.  Intuitively, you'd think the image would be more convincing
with eyes closed, so the brain doesn't insert its own optically-
derived notion of where sounds originate.

>> Too bad--I would have loved to see how you approached that.  Are you
>> at liberty to say what platform you were running on, or how CPU-
>> intensive the algorithm was?  I would expect considerable savings over
>> the direct impulse convolution approach.
>
>Sorry; I cannot say anything about the implementation.

Now I'm really curious!  From what I've seen of your work (the paper
on filters), I presume you've done some very cool things.  If you're
free to post anything in the future, even about licensing, please
follow up.

>> I've also heard that reflections from side walls are very important in
>> determining sense of depth (and perhaps relate to front-back).  DId
>> you consider trying that?  Even low-level lateral cues may help.
>
>As above.
>
>> This is an interesting subject--way deeper than I suspected when I
>> first started researching it.  That humans use reflections off
>> shoulders for spatial judgement?  That's a complex evolutionary step!
>
>It is said that some blind people can echolocate. I find that fascinating.
>
>Greg

I wouldn't be surprised if we do a bit of that ourselves at times.  My
father has worked with bats at one of the large caves in TX, and he
describes amazing spatial acuity.  I suppose it would have to be, in
order to catch mosquitos and such.

At this point, I'm still assuming that I won't have enough CPU to do
any of the fun stuff that you've done, but who knows.  For a start,
I'm thinking of just trying rough head-shadow, as I mentioned. 
Any suggestions for approaching that?  Do you know of any approximate
filter models?  I haven't seen anything that simple, perhaps because
it's not much fun for researchers.

Reply by Greg Berchin ●November 12, 20162016-11-12

On Saturday, November 12, 2016 at 7:19:59 PM UTC-6, Max wrote:

> I found some HRTF data at MIT, IRCAM, and of course the CIPIC
> database.  But I'm not sure that I'll have enough CPU time to make use
> of any impulse libraries.  Or were you suggesting something else?

I was only relating that *I* implemented the CIPIC database as a starting point. Ultimately what I did went WAY beyond that -- that's the stuff that I cannot talk about.

> I was mostly curious about generating 'out-of-head' headphone signals,
> rather than fine-tuning exact spatial positioning.  But I have just
> started to research it.  I was figuring that the room reflections and
> roughly computed head-shadow/ITD/IL model would get most of the way
> there.  Are you saying that individualized pinna reflections are
> absolutely essential to that? 

I cannot say that they are absolutely essential; that is for the academic researchers to decide. But I can say that using "someone else's HRTF" does not seem to work for most people. I believe that a lot of the literature supports this observation.

> You're lucky to have a good test subject so close at hand. :-)

Perhaps. But it was frustrating!

> Perhaps the solution to the individualized pinna problem is to ship a
> pair of standardized rubber ears with the software, with instructions
> to wear them for three weeks before running the programs. :-)  (I have
> heard that the brain will learn the new impulse response in about that
> time)

Somebody actually marketed such a product many years ago. It was called "Serious Listeners".

> That's odd.  It's kind of backwards from the speculation in the CIPIC
> papers.  Intuitively, you'd think the image would be more convincing
> with eyes closed, so the brain doesn't insert its own optically-
> derived notion of where sounds originate.

I auditioned the Realiser while watching a motion picture, so the image provided the necessary cues.

> Now I'm really curious!  From what I've seen of your work (the paper
> on filters), I presume you've done some very cool things.  

I've been in the biz for 35 years. Lots of opportunity to work on "very cool things".

> If you're
> free to post anything in the future, even about licensing, please
> follow up.

It is the nature of the non-academic engineering business that a huge quantity of knowledge is maintained as trade secrets.

> At this point, I'm still assuming that I won't have enough CPU to do
> any of the fun stuff that you've done, but who knows.  For a start,
> I'm thinking of just trying rough head-shadow, as I mentioned. 
> Any suggestions for approaching that?  Do you know of any approximate
> filter models?  I haven't seen anything that simple, perhaps because
> it's not much fun for researchers.

Implementation issues (filter designs, computational load reduction) can be solved with straightforward engineering. I simply cannot comment about specifics of what I did with HRTFs or related concepts.

Greg

Reply by Max ●November 17, 20162016-11-17

On Sat, 12 Nov 2016 19:27:17 -0800 (PST), Greg Berchin
<gjberchin@charter.net> wrote:

>On Saturday, November 12, 2016 at 7:19:59 PM UTC-6, Max wrote:

>> I was mostly curious about generating 'out-of-head' headphone signals,
>> rather than fine-tuning exact spatial positioning. 

>I cannot say that they are absolutely essential; that is for the academic researchers to decide. But I can say that using "someone else's HRTF" does not seem to work for most people. I believe that a lot of the literature supports this observation.

I have heard that quite a bit.  More talk of errors in elevation
though.  I was figuring that's because it's mostly dependant on the
shape of the pinnae, as opposed to azimuth, where everyone has a left
and right detector as a basic start.

>> Perhaps the solution to the individualized pinna problem is to ship a
>> pair of standardized rubber ears with the software, with instructions
>> to wear them for three weeks before running the programs. :-)  (I have
>> heard that the brain will learn the new impulse response in about that
>> time)
>
>Somebody actually marketed such a product many years ago. It was called "Serious Listeners".

Doh!  I thought I was joking.  I can't imagine that it made a huge
splash in the marketplace, unless they had a Spock model for Star Trek
conventions.  I couldn't find anything via Google.

>> If you're
>> free to post anything in the future, even about licensing, please
>> follow up.
>
>It is the nature of the non-academic engineering business that a huge quantity of knowledge is maintained as trade secrets.

Yeah, I've worked on some.  Those were generally not top secret, as
they'd sometimes publish bullet points to pull in sales, even before
development was complete.  I thought maybe the company in this case
might be marketing a related product.  If not, I completely
understand.

>> At this point, I'm still assuming that I won't have enough CPU to do
>> any of the fun stuff that you've done, but who knows.  For a start,
>> I'm thinking of just trying rough head-shadow, as I mentioned. 
>> Any suggestions for approaching that?  Do you know of any approximate
>> filter models?  I haven't seen anything that simple, perhaps because
>> it's not much fun for researchers.
>
>Implementation issues (filter designs, computational load reduction) can be solved with straightforward engineering. I simply cannot comment about specifics of what I did with HRTFs or related concepts.

Very good, Greg.  I wasn't trying to press for proprietary info on
your own project.  I've been trying to gain some insights from those
who have worked on HRTF in general.. I do appreciate your comments,
especially in that we've had this thread to ourselves.  I'm surprised.
Maybe people are still in shock after Terrible Tuesday....I dunno.

Reply by Andre Lodwig ●November 17, 20162016-11-17

Are you aware of the work done by Prof. Blauert on this?
Keyword: "Blauert Bands"....


On 17.11.2016 06:51, Max wrote:
> On Sat, 12 Nov 2016 19:27:17 -0800 (PST), Greg Berchin
> <gjberchin@charter.net> wrote:
>
>> On Saturday, November 12, 2016 at 7:19:59 PM UTC-6, Max wrote:
>
>>> I was mostly curious about generating 'out-of-head' headphone signals,
>>> rather than fine-tuning exact spatial positioning.
>
>> I cannot say that they are absolutely essential; that is for the academic researchers to decide. But I can say that using "someone else's HRTF" does not seem to work for most people. I believe that a lot of the literature supports this observation.
>
> I have heard that quite a bit.  More talk of errors in elevation
> though.  I was figuring that's because it's mostly dependant on the
> shape of the pinnae, as opposed to azimuth, where everyone has a left
> and right detector as a basic start.
>
>>> Perhaps the solution to the individualized pinna problem is to ship a
>>> pair of standardized rubber ears with the software, with instructions
>>> to wear them for three weeks before running the programs. :-)  (I have
>>> heard that the brain will learn the new impulse response in about that
>>> time)
>>
>> Somebody actually marketed such a product many years ago. It was called "Serious Listeners".
>
> Doh!  I thought I was joking.  I can't imagine that it made a huge
> splash in the marketplace, unless they had a Spock model for Star Trek
> conventions.  I couldn't find anything via Google.
>
>>> If you're
>>> free to post anything in the future, even about licensing, please
>>> follow up.
>>
>> It is the nature of the non-academic engineering business that a huge quantity of knowledge is maintained as trade secrets.
>
> Yeah, I've worked on some.  Those were generally not top secret, as
> they'd sometimes publish bullet points to pull in sales, even before
> development was complete.  I thought maybe the company in this case
> might be marketing a related product.  If not, I completely
> understand.
>
>>> At this point, I'm still assuming that I won't have enough CPU to do
>>> any of the fun stuff that you've done, but who knows.  For a start,
>>> I'm thinking of just trying rough head-shadow, as I mentioned.
>>> Any suggestions for approaching that?  Do you know of any approximate
>>> filter models?  I haven't seen anything that simple, perhaps because
>>> it's not much fun for researchers.
>>
>> Implementation issues (filter designs, computational load reduction) can be solved with straightforward engineering. I simply cannot comment about specifics of what I did with HRTFs or related concepts.
>
> Very good, Greg.  I wasn't trying to press for proprietary info on
> your own project.  I've been trying to gain some insights from those
> who have worked on HRTF in general.. I do appreciate your comments,
> especially in that we've had this thread to ourselves.  I'm surprised.
> Maybe people are still in shock after Terrible Tuesday....I dunno.
>

Previous12 Next

HRTF approximation with FIR or IIR filters

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group