Hi everyone,
I'm quite new to dsp, programming a tool to optimize surround music for
listening on headphones, stereo speaker systems, etcetera. (Such tools probably
already exist, but i couldn't find any free software that does what i
want.) The best HRIR measurements i could find are those of the Listen project,
and with the help of those, i reached some, to my mind, astonishing results that
encourage me to delve a little deeper into this subject.
The project, however, provides impulse responses in stereo pairs for a very
limited set of angles - enough when you want to place the virtual speakers on a
circle around the listener, but not nearly sufficient if you want more freedom.
To remove this constraint, i figured that the stereo hrtf pairs should be
decoupled (thanks to
http://alumnus.caltech.edu/~franko/thesis/Chapter4.html#sub4), and that it would
be useful to interpolate for the missing directions.
My tool currently uses the existing hrtfs and upsamples using src to match the
input audio. Convolution is done in the frequency domain with the help of
fftw's incredibly fast fourier transforms. I still have some questions -
thoug it appears to work quite well - about the deconvolution needed for
headphone or stereo equipment (including crossfeeding) cancelation, but that is
not what i would like to address in this post.
To interpolate the hrirs in order to be able to provide one for every point on
the sphere (except perhaps for those below -40 degrees, as the listen project
measured no IRs below this point), and combining with the sample rate
upsampling, three dimensional interpolation would be needed: in the sample rate,
azimuth as well as the elevation direction. So, in order to gain some experience
with upsampling, i started out with image resampling.
The most straightforward way to do this, is by placing the frequency spectrum of
the smaller image into that of the larger image (possibly with lanczos windowing
(i.e. linear shading at the sides) to lessen ringing effects), and performing
the ift. The equivalent in the image pixel domain is convolution with a sinc
function. Theoretically, this works as long as the original signal is
bandlimited.
The difficulty now is that the sampling for the hrirs is not equidistant in the
azimuth direction, though in this case, for once, it *is* periodic. (For those
wondering why the distances between the samples differ: this is because the
impulse responses were recorded for increments of 15 degrees around the center
of the head, not centered on each of both ears. This means also that the
distances from the ears are not all exactly the same.) Additionally, as the
measurements are for a sphere, the azimuth increment decreases for higher
elevations. These peculiarities would seem to rule out the use of fourier
transforms.
The sinc trick, on the other hand, could still be used in this case. My question
then is, if this would be warranted - if it would still be in line with the
sampling theorem to upsample using a sinc convolution with non-equidistant
samples, or if, perhaps, a slightly or wholly different function should be used.
Furthermore, it would be nice to do it in the frequency domain, though i'm
not sure it would be computationally more efficient when only a few hrirs,
rather that a full upsampled spectrum, are needed.
Beyond that one could ask if it would be of any real-world use in this
upsampling. At the very least, the volume difference should be brought into
account when using the hrtfs to emulate farther-off sounds, but it is far from
certain that it would closely resemble the actual hrtf as it would be recorded
from that distance. For the points on the sphere, however, at the same distance
where the original hrtfs were recorded, i suppose it has a good chance of
working reasonably well. And i guess, more than all else, it's just that
i'm in for another challenge.
Thanks in advance for any information.
Maarten
HRIR interpolation
Started by ●April 4, 2010
Reply by ●April 4, 20102010-04-04
Maarten-
I read your post several times and I'm still not sure what it is that you actually want to do. It seems that you're
trying to take existing HRIR and HRTF data and interpolate/extrapolate to build a "spherical response" that shows a
resonable approximation of HRTF in full 3D... is that it?
-Jeff
> I'm quite new to dsp, programming a tool to optimize surround music for listening on headphones, stereo speaker
> systems, etcetera. (Such tools probably already exist, but i couldn't find any free software that does what i want.)
> The best HRIR measurements i could find are those of the Listen project, and with the help of those, i reached some,
> to my mind, astonishing results that encourage me to delve a little deeper into this subject.
>
> The project, however, provides impulse responses in stereo pairs for a very limited set of angles - enough when you
> want to place the virtual speakers on a circle around the listener, but not nearly sufficient if you want more
> freedom. To remove this constraint, i figured that the stereo hrtf pairs should be decoupled (thanks to
> http://alumnus.caltech.edu/~franko/thesis/Chapter4.html#sub4), and that it would be useful to interpolate for the
> missing directions.
>
> My tool currently uses the existing hrtfs and upsamples using src to match the input audio. Convolution is done in the
> frequency domain with the help of fftw's incredibly fast fourier transforms. I still have some questions - thoug it
> appears to work quite well - about the deconvolution needed for headphone or stereo equipment (including crossfeeding)
> cancelation, but that is not what i would like to address in this post.
>
> To interpolate the hrirs in order to be able to provide one for every point on the sphere (except perhaps for those
> below -40 degrees, as the listen project measured no IRs below this point), and combining with the sample rate
> upsampling, three dimensional interpolation would be needed: in the sample rate, azimuth as well as the elevation
> direction. So, in order to gain some experience with upsampling, i started out with image resampling.
>
> The most straightforward way to do this, is by placing the frequency spectrum of the smaller image into that of the
> larger image (possibly with lanczos windowing (i.e. linear shading at the sides) to lessen ringing effects), and
> performing the ift. The equivalent in the image pixel domain is convolution with a sinc function. Theoretically, this
> works as long as the original signal is bandlimited.
>
> The difficulty now is that the sampling for the hrirs is not equidistant in the azimuth direction, though in this
> case, for once, it *is* periodic. (For those wondering why the distances between the samples differ: this is because
> the impulse responses were recorded for increments of 15 degrees around the center of the head, not centered on each
> of both ears. This means also that the distances from the ears are not all exactly the same.) Additionally, as the
> measurements are for a sphere, the azimuth increment decreases for higher elevations. These peculiarities would seem
> to rule out the use of fourier transforms.
>
> The sinc trick, on the other hand, could still be used in this case. My question then is, if this would be warranted -
> if it would still be in line with the sampling theorem to upsample using a sinc convolution with non-equidistant
> samples, or if, perhaps, a slightly or wholly different function should be used. Furthermore, it would be nice to do
> it in the frequency domain, though i'm not sure it would be computationally more efficient when only a few hrirs,
> rather that a full upsampled spectrum, are needed.
>
> Beyond that one could ask if it would be of any real-world use in this upsampling. At the very least, the volume
> difference should be brought into account when using the hrtfs to emulate farther-off sounds, but it is far from
> certain that it would closely resemble the actual hrtf as it would be recorded from that distance. For the points on
> the sphere, however, at the same distance where the original hrtfs were recorded, i suppose it has a good chance of
> working reasonably well. And i guess, more than all else, it's just that i'm in for another challenge.
>
> Thanks in advance for any information.
>
> Maarten
I read your post several times and I'm still not sure what it is that you actually want to do. It seems that you're
trying to take existing HRIR and HRTF data and interpolate/extrapolate to build a "spherical response" that shows a
resonable approximation of HRTF in full 3D... is that it?
-Jeff
> I'm quite new to dsp, programming a tool to optimize surround music for listening on headphones, stereo speaker
> systems, etcetera. (Such tools probably already exist, but i couldn't find any free software that does what i want.)
> The best HRIR measurements i could find are those of the Listen project, and with the help of those, i reached some,
> to my mind, astonishing results that encourage me to delve a little deeper into this subject.
>
> The project, however, provides impulse responses in stereo pairs for a very limited set of angles - enough when you
> want to place the virtual speakers on a circle around the listener, but not nearly sufficient if you want more
> freedom. To remove this constraint, i figured that the stereo hrtf pairs should be decoupled (thanks to
> http://alumnus.caltech.edu/~franko/thesis/Chapter4.html#sub4), and that it would be useful to interpolate for the
> missing directions.
>
> My tool currently uses the existing hrtfs and upsamples using src to match the input audio. Convolution is done in the
> frequency domain with the help of fftw's incredibly fast fourier transforms. I still have some questions - thoug it
> appears to work quite well - about the deconvolution needed for headphone or stereo equipment (including crossfeeding)
> cancelation, but that is not what i would like to address in this post.
>
> To interpolate the hrirs in order to be able to provide one for every point on the sphere (except perhaps for those
> below -40 degrees, as the listen project measured no IRs below this point), and combining with the sample rate
> upsampling, three dimensional interpolation would be needed: in the sample rate, azimuth as well as the elevation
> direction. So, in order to gain some experience with upsampling, i started out with image resampling.
>
> The most straightforward way to do this, is by placing the frequency spectrum of the smaller image into that of the
> larger image (possibly with lanczos windowing (i.e. linear shading at the sides) to lessen ringing effects), and
> performing the ift. The equivalent in the image pixel domain is convolution with a sinc function. Theoretically, this
> works as long as the original signal is bandlimited.
>
> The difficulty now is that the sampling for the hrirs is not equidistant in the azimuth direction, though in this
> case, for once, it *is* periodic. (For those wondering why the distances between the samples differ: this is because
> the impulse responses were recorded for increments of 15 degrees around the center of the head, not centered on each
> of both ears. This means also that the distances from the ears are not all exactly the same.) Additionally, as the
> measurements are for a sphere, the azimuth increment decreases for higher elevations. These peculiarities would seem
> to rule out the use of fourier transforms.
>
> The sinc trick, on the other hand, could still be used in this case. My question then is, if this would be warranted -
> if it would still be in line with the sampling theorem to upsample using a sinc convolution with non-equidistant
> samples, or if, perhaps, a slightly or wholly different function should be used. Furthermore, it would be nice to do
> it in the frequency domain, though i'm not sure it would be computationally more efficient when only a few hrirs,
> rather that a full upsampled spectrum, are needed.
>
> Beyond that one could ask if it would be of any real-world use in this upsampling. At the very least, the volume
> difference should be brought into account when using the hrtfs to emulate farther-off sounds, but it is far from
> certain that it would closely resemble the actual hrtf as it would be recorded from that distance. For the points on
> the sphere, however, at the same distance where the original hrtfs were recorded, i suppose it has a good chance of
> working reasonably well. And i guess, more than all else, it's just that i'm in for another challenge.
>
> Thanks in advance for any information.
>
> Maarten
Reply by ●April 5, 20102010-04-05
Jeff,
> I read your post several times and I'm still not sure what it is that you actually want to do. It seems that you're
> trying to take existing HRIR and HRTF data and interpolate/extrapolate to build a "spherical response" that shows a
> resonable approximation of HRTF in full 3D... is that it?
Exactly! In first instance, that is what i want to do: to generalize a
(very) discretely sampled HRIR sphere to a continuous one. Sorry if that
was not clear from my description. I'm not a native speaker.
These HRIRs could then be used not only to create virtual sound
originating from every point on the sphere, but also, taking into
consideration changes in volume and delay, from points closer or farther
away, where the difference in angle between the ears is larger or
smaller, respectively.
Maarten
> I read your post several times and I'm still not sure what it is that you actually want to do. It seems that you're
> trying to take existing HRIR and HRTF data and interpolate/extrapolate to build a "spherical response" that shows a
> resonable approximation of HRTF in full 3D... is that it?
Exactly! In first instance, that is what i want to do: to generalize a
(very) discretely sampled HRIR sphere to a continuous one. Sorry if that
was not clear from my description. I'm not a native speaker.
These HRIRs could then be used not only to create virtual sound
originating from every point on the sphere, but also, taking into
consideration changes in volume and delay, from points closer or farther
away, where the difference in angle between the ears is larger or
smaller, respectively.
Maarten