DSPRelated.com
Forums

Async. sample rate conversion for audio - various methods vs fractional delay filters (Farrow)

Started by gretzteam June 23, 2008
Hi,
I am currently investigating the use of fractional delay filters for the
asynchronous sample rate conversion of audio signals. I'm getting quite
confused with the various methods and hopefully some of you guys could
help.

First of all, let's make it clear that this is for 'asynchronous'
conversion. Nothing is known about the input to output ratio, except
that it's within a 'normal audio range - 8x up or down-' and it varies
very slowly. Let's also assume that we have a mean to obtain a high
precision (20 bits) output time for every output sample -> we know where
between two input samples we want to get the new sample. Basically,
let's focus on the filtering part of the problem.

Various methods that I think I understand:
1) Interpolate input signal by 4, 8 or 16, followed by some kind of
polynomial interpolator (linear, spline, lagrange...). This works well
if fsout>fsin. If not, we need to add a decimating filter. The bandwidth
of that filter would need to be adjusted depending on the output rate
which I guess is do-able since there are usually only a few fixed audio
frequency. I guess the interpolating filter could also perform the
downsampling but it would now becomes a harder filter to design and
implement. This all seems like a hack to me? There must be a filter
structure that would do it all (would this be the fractional delay
filter?).

2) Use a quite large FIR filter to generate something like 2^16
intermediate input sample points and simply pick the closest one we
need. A polyphase approach could be taken so we only calculate say a
64-tap filter when an output sample is requested. This would require
quite a lot of memory to store all the coefficients though. However, the
'decimating' case can be handled nicely by that filter with some
scaling. I believe this is the approach taken by most of the commercial
ICs available (AD1896, CS8420).

3) I'm sure there are other ways to approach the problem, some mixture
of the above two etc...

4) Can this problem be solved more efficiently using fractional delay
filters (farrow) more efficiently? I am VERY confused about when those
filters come into play. The only reference I could find about audio is
here (hopefully some of you can read it):
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=870683
but it seems to be designed for only one set of fractional delay
elements (44.1 to 48 or vice-versa). I'm not sure if the exact same
approach would work with a varrying delay element - Help! Also, I'm not
quite sure how this differs from method (1) described above. Can
fractional delay filter also take care of the downsampling part of the
problem without added complexity? 

Whouah that's longer than I thought! I know this is quite a lot of
questions, but any opinions would be appreciated! Also, please correct me
if I made any false statements.

Thank you very much,

gretz



On Jun 23, 9:22 pm, "gretzteam" <gretzt...@yahoo.com> wrote:
> > I am currently investigating the use of fractional delay filters for the > asynchronous sample rate conversion of audio signals. I'm getting quite > confused with the various methods and hopefully some of you guys could > help. > > First of all, let's make it clear that this is for 'asynchronous' > conversion. Nothing is known about the input to output ratio, except > that it's within a 'normal audio range - 8x up or down-' and it varies > very slowly.
okay, the asynchrounous spec means that the SRC ratio is adjusted (by a servo-control system that attempts to keep the output pointer of a buffer, a fixed distance in time behind the input pointer). async does mean that you need a high precision fractional part to that output pointer...
> Let's also assume that we have a mean to obtain a high > precision (20 bits) output time for every output sample -> we know where > between two input samples we want to get the new sample.
... dunno if that fractional delay needs to be 20 bits, but let's assume it's continuous.
> Basically, let's focus on the filtering part of the problem.
you mean the "interpolation part of the problem", right? this is the part that async SRC ("ASRC") has in common with synchronous SRC (where the SRC ratio is given and fixed).
> Various methods that I think I understand: > 1) Interpolate input signal by 4, 8 or 16, followed by some kind of > polynomial interpolator (linear, spline, lagrange...). This works well > if fsout>fsin. If not, we need to add a decimating filter. The bandwidth > of that filter would need to be adjusted depending on the output rate > which I guess is do-able since there are usually only a few fixed audio > frequency. I guess the interpolating filter could also perform the > downsampling but it would now becomes a harder filter to design and > implement. This all seems like a hack to me? There must be a filter > structure that would do it all (would this be the fractional delay > filter?).
when downsampling, the same FIR coef table can be used, but you stride through it at a rate of Fs_out/Fs_in compared to the stride when upsamples. unlike for downsampling (when additional LPFing is needed for anti-aliasing), different SRC ratios for upsampling do not change that stride in the coef table.
> 2) Use a quite large FIR filter to generate something like 2^16 > intermediate input sample points and simply pick the closest one we > need. A polyphase approach could be taken so we only calculate say a > 64-tap filter when an output sample is requested. This would require > quite a lot of memory to store all the coefficients though.
you can linearly interpolate between adjacent phases. you don't need more than 512 phases (or equally spaced fractional delays) if you linearly interpolate (which doubles the FIR costs). at least not for audio apps (130 dB S/N)
> However, the > 'decimating' case can be handled nicely by that filter with some > scaling. I believe this is the approach taken by most of the commercial > ICs available (AD1896, CS8420).
that's what i meant by changing the "stride".
> 3) I'm sure there are other ways to approach the problem, some mixture > of the above two etc... > > 4) Can this problem be solved more efficiently using fractional delay > filters (farrow) more efficiently? I am VERY confused about when those > filters come into play. The only reference I could find about audio is > here (hopefully some of you can read it):http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=870683 > but it seems to be designed for only one set of fractional delay > elements (44.1 to 48 or vice-versa). I'm not sure if the exact same > approach would work with a varrying delay element - Help! Also, I'm not > quite sure how this differs from method (1) described above. Can > fractional delay filter also take care of the downsampling part of the > problem without added complexity? > > Whouah that's longer than I thought! I know this is quite a lot of > questions, but any opinions would be appreciated! Also, please correct me > if I made any false statements.
i forgot what the trick is in the Farrow SRC filters. can someone spell out what the salient thing that the Farrow design does over "conventional"? r b-j
On Jun 23, 10:04&#4294967295;pm, robert bristow-johnson
<r...@audioimagination.com> wrote:
> On Jun 23, 9:22 pm, "gretzteam" <gretzt...@yahoo.com> wrote: > > > > > I am currently investigating the use of fractional delay filters for the > > asynchronous sample rate conversion of audio signals. I'm getting quite > > confused with the various methods and hopefully some of you guys could > > help. > > > First of all, let's make it clear that this is for 'asynchronous' > > conversion. Nothing is known about the input to output ratio, except > > that it's within a 'normal audio range - 8x up or down-' and it varies > > very slowly.
Each output sample from an "asynchronous" resampler is just a bandlimited interpolation. You can treat each point individually without any reference to the rates and such, as long as you bandlimit below the minimum ceiling rate / 2.
> okay, the asynchrounous spec means that the SRC ratio is adjusted (by > a servo-control system that attempts to keep the output pointer of a > buffer, a fixed distance in time behind the input pointer). &#4294967295;async > does mean that you need a high precision fractional part to that > output pointer... > > > Let's also assume that we have a mean to obtain a high > > precision (20 bits) output time for every output sample -> we know where > > between two input samples we want to get the new sample. > > ... dunno if that fractional delay needs to be 20 bits, but let's > assume it's continuous. > > > Basically, let's focus on the filtering part of the problem. > > you mean the "interpolation part of the problem", right? &#4294967295;this is the > part that async SRC ("ASRC") has in common with synchronous SRC (where > the SRC ratio is given and fixed). > > > Various methods that I think I understand: > > 1) Interpolate input signal by 4, 8 or 16, followed by some kind of > > polynomial interpolator (linear, spline, lagrange...). This works well > > if fsout>fsin. If not, we need to add a decimating filter. The bandwidth > > of that filter would need to be adjusted depending on the output rate > > which I guess is do-able since there are usually only a few fixed audio > > frequency. I guess the interpolating filter could also perform the > > downsampling but it would now becomes a harder filter to design and > > implement. This all seems like a hack to me? There must be a filter > > structure that would do it all (would this be the fractional delay > > filter?). > > when downsampling, the same FIR coef table can be used, but you stride > through it at a rate of Fs_out/Fs_in compared to the stride when > upsamples. &#4294967295;unlike for downsampling (when additional LPFing is needed > for anti-aliasing), different SRC ratios for upsampling do not change > that stride in the coef table. > > > 2) Use a quite large FIR filter to generate something like 2^16 > > intermediate input sample points and simply pick the closest one we > > need. A polyphase approach could be taken so we only calculate say a > > 64-tap filter when an output sample is requested. This would require > > quite a lot of memory to store all the coefficients though. > > you can linearly interpolate between adjacent phases. &#4294967295;you don't need > more than 512 phases (or equally spaced fractional delays) if you > linearly interpolate (which doubles the FIR costs). &#4294967295;at least not for > audio apps (130 dB S/N)
You don't need to use a finite number of taps if you can calculate your filter kernel on the fly (say with a simply windowed Sinc, or Farrow approximation). A fast PC can recalculate each tap of a von Hann windowed Sinc for each new sample fast enough to keep up with several channels of real time audio. No table needed. If not, the "phases" are just an interpolation table, and there's a lot of old literature on how to optimize interpolation tables for funtion approximation (finite differences, multi-resolution, and the such...)
> > However, the > > 'decimating' case can be handled nicely by that filter with some > > scaling. I believe this is the approach taken by most of the commercial > > ICs available (AD1896, CS8420). > > that's what i meant by changing the "stride". > > > > > 3) I'm sure there are other ways to approach the problem, some mixture > > of the above two etc... > > > 4) Can this problem be solved more efficiently using fractional delay > > filters (farrow) more efficiently? I am VERY confused about when those > > filters come into play. The only reference I could find about audio is > > here (hopefully some of you can read it):http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=870683 > > but it seems to be designed for only one set of fractional delay > > elements (44.1 to 48 or vice-versa). I'm not sure if the exact same > > approach would work with a varrying delay element - Help! Also, I'm not > > quite sure how this differs from method (1) described above. Can > > fractional delay filter also take care of the downsampling part of the > > problem without added complexity? > > > Whouah that's longer than I thought! I know this is quite a lot of > > questions, but any opinions would be appreciated! Also, please correct me > > if I made any false statements. > > i forgot what the trick is in the Farrow SRC filters. &#4294967295;can someone > spell out what the salient thing that the Farrow design does over > "conventional"?
A Farrow filter uses polynomial interpolaters for each "lobe" or half lobe of a windowed sinc (or other filter kernel). Useful in certain FPGA's and other hardware where pipeline-able MACs are cheaper than table look-ups. IMHO. YMMV. -- rhn A.T nicholson d.0.t C-o-M http://www.nicholson.com/rhn/dsp.html
> 3) I'm sure there are other ways to approach the problem, some mixture > of the above two etc...
Yep. The new ESS Sabre DAC seems to use a novel approach. This is a DAC which runs on its own clock (low jitter) and it accepts digital audio data in SPDIF or I2S with an asynchronous clock (high jitter). It is a multibit sigma-delta DAC so the input is highly oversampled (don't remember how much) using polyphase filters, then the asynchronous resampling is done on the highly oversampled data which allows use of a very simple interpolation algorithm. This is very clever (and patented). It is a good example of lateral thought. Instead of solving a hard problem (asynchronous resampling at frequencies close to the Nyquist limit) it turns it into an easy problem (asynchronous resampling with a sample frequency way above the Nyquist limit). The chip incorporates lots of other extremely clever tricks. It has been reported as potentially the best sounding chip ever by most of those who tried to implement it.
On Jun 24, 7:09&#4294967295;am, PFC <li...@peufeu.com> wrote:
> > 3) I'm sure there are other ways to approach the problem, some mixture > > of the above two etc... > > &#4294967295; &#4294967295; &#4294967295; &#4294967295; Yep. > &#4294967295; &#4294967295; &#4294967295; &#4294967295; The new ESS Sabre DAC seems to use a novel approach. This is a DAC which &#4294967295; > runs on its own clock (low jitter) and it accepts digital audio data in &#4294967295; > SPDIF or I2S with an asynchronous clock (high jitter). > &#4294967295; &#4294967295; &#4294967295; &#4294967295; It is a multibit sigma-delta DAC so the input is highly oversampled &#4294967295; > (don't remember how much) using polyphase filters, then the asynchronous &#4294967295; > resampling is done on the highly oversampled data which allows use of a &#4294967295; > very simple interpolation algorithm. > &#4294967295; &#4294967295; &#4294967295; &#4294967295; This is very clever (and patented). > > &#4294967295; &#4294967295; &#4294967295; &#4294967295; It is a good example of lateral thought. Instead of solving a hard &#4294967295; > problem (asynchronous resampling at frequencies close to the Nyquist &#4294967295; > limit) it turns it into an easy problem (asynchronous resampling with a &#4294967295; > sample frequency way above the Nyquist limit). > > &#4294967295; &#4294967295; &#4294967295; &#4294967295; The chip incorporates lots of other extremely clever tricks. > &#4294967295; &#4294967295; &#4294967295; &#4294967295; It has been reported as potentially the best sounding chip ever by most &#4294967295; > of those who tried to implement it.
One problem you will encounter in your design is the following. When the output sample-rate falls below the input sample-rate, there should be a bandlimiting filter that tracks the output rate so that input signals that are avove fs_out/2 get filtered out. One reason the IC folks (of which I am one) use the "single fractional- delay filter with coefficients calculated on-the-fly" approach is that there are clever ways to stretch the impulse response of this filter such that the cutoff frequency varies in an almost-continuous fashion. There will also be a gradual increase in group-delay as this filter scales down, as you might expect. In many other approaches this can present quite a difficult problem, and often people end up with multiple sets of coefficients that are switched in for specific ranges of sample-rates. This may be acceptable for many consumer applications, or applications where the sample-rate ratios fall into specific narrow ranges, in which case the overhead of needing multiple sets of coefficients is not so high. Bob Adams
>On Jun 23, 9:22 pm, "gretzteam" <gretzt...@yahoo.com> wrote: >> >> I am currently investigating the use of fractional delay filters for
the
>> asynchronous sample rate conversion of audio signals. I'm getting
quite
>> confused with the various methods and hopefully some of you guys could >> help. >> >> First of all, let's make it clear that this is for 'asynchronous' >> conversion. Nothing is known about the input to output ratio, except >> that it's within a 'normal audio range - 8x up or down-' and it varies >> very slowly. > >okay, the asynchrounous spec means that the SRC ratio is adjusted (by >a servo-control system that attempts to keep the output pointer of a >buffer, a fixed distance in time behind the input pointer). async >does mean that you need a high precision fractional part to that >output pointer... > >> Let's also assume that we have a mean to obtain a high >> precision (20 bits) output time for every output sample -> we know
where
>> between two input samples we want to get the new sample. > >... dunno if that fractional delay needs to be 20 bits, but let's >assume it's continuous. > >> Basically, let's focus on the filtering part of the problem. > >you mean the "interpolation part of the problem", right? this is the >part that async SRC ("ASRC") has in common with synchronous SRC (where >the SRC ratio is given and fixed).
so far we are on the same page.
>> Various methods that I think I understand: >> 1) Interpolate input signal by 4, 8 or 16, followed by some kind of >> polynomial interpolator (linear, spline, lagrange...). This works well >> if fsout>fsin. If not, we need to add a decimating filter. The
bandwidth
>> of that filter would need to be adjusted depending on the output rate >> which I guess is do-able since there are usually only a few fixed
audio
>> frequency. I guess the interpolating filter could also perform the >> downsampling but it would now becomes a harder filter to design and >> implement. This all seems like a hack to me? There must be a filter >> structure that would do it all (would this be the fractional delay >> filter?). > >when downsampling, the same FIR coef table can be used, but you stride >through it at a rate of Fs_out/Fs_in compared to the stride when >upsamples. unlike for downsampling (when additional LPFing is needed >for anti-aliasing), different SRC ratios for upsampling do not change >that stride in the coef table. > >> 2) Use a quite large FIR filter to generate something like 2^16 >> intermediate input sample points and simply pick the closest one we >> need. A polyphase approach could be taken so we only calculate say a >> 64-tap filter when an output sample is requested. This would require >> quite a lot of memory to store all the coefficients though. > >you can linearly interpolate between adjacent phases. you don't need >more than 512 phases (or equally spaced fractional delays) if you >linearly interpolate (which doubles the FIR costs). at least not for >audio apps (130 dB S/N)
ok so isn't this saying that method 1 and 2 are the same. Either you do a small interpolation (say 8x), followed by a good polynomial interpolation(3rd order spline or lagrange). Or you peform a better interpolation upfront (512x) followed by a simpler polynomial (linear). Or you do even a better interpolation (2^16), followed by a pretty poor interpolator(sample and hold). I guess depending on hardware, power and performance targets, there is a sweet spot in this solution space?
>> However, the >> 'decimating' case can be handled nicely by that filter with some >> scaling. I believe this is the approach taken by most of the
commercial
>> ICs available (AD1896, CS8420). > >that's what i meant by changing the "stride". > >> 3) I'm sure there are other ways to approach the problem, some mixture >> of the above two etc... >> >> 4) Can this problem be solved more efficiently using fractional delay >> filters (farrow) more efficiently? I am VERY confused about when those >> filters come into play. The only reference I could find about audio is >> here (hopefully some of you can read
it):http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=870683
>> but it seems to be designed for only one set of fractional delay >> elements (44.1 to 48 or vice-versa). I'm not sure if the exact same >> approach would work with a varrying delay element - Help! Also, I'm
not
>> quite sure how this differs from method (1) described above. Can >> fractional delay filter also take care of the downsampling part of the >> problem without added complexity? >> >> Whouah that's longer than I thought! I know this is quite a lot of >> questions, but any opinions would be appreciated! Also, please correct
me
>> if I made any false statements. > >i forgot what the trick is in the Farrow SRC filters. can someone >spell out what the salient thing that the Farrow design does over >"conventional"? > >r b-j >

robert bristow-johnson wrote:


> i forgot what the trick is in the Farrow SRC filters. can someone > spell out what the salient thing that the Farrow design does over > "conventional"?
Farrow filter is the polynomial interpolation by means of the Tailor series. What is good about that: all of the derivatives can be computed in parallel in the hardware, and then the polynomial can be calculated using Horner rule. I.e. it is a good architecture for the hardware implementation. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
On Tue, 24 Jun 2008 09:06:01 -0500, Vladimir Vassilevsky
<antispam_bogus@hotmail.com> wrote:
>Farrow filter is the polynomial interpolation by means of the Tailor >series.
Do you want a jacket with that sir? ;-)
Vladimir Vassilevsky <antispam_bogus@hotmail.com> writes:
> [...] > Tailor
A tailor is one who mends your clothes. Brook Taylor was an English mathematician that pioneered the use of series in analysis. -- % Randy Yates % "She has an IQ of 1001, she has a jumpsuit %% Fuquay-Varina, NC % on, and she's also a telephone." %%% 919-577-9882 % %%%% <yates@ieee.org> % 'Yours Truly, 2095', *Time*, ELO http://www.digitalsignallabs.com
> > >robert bristow-johnson wrote: > > >> i forgot what the trick is in the Farrow SRC filters. can someone >> spell out what the salient thing that the Farrow design does over >> "conventional"? > > >Farrow filter is the polynomial interpolation by means of the Tailor >series. What is good about that: all of the derivatives can be computed >in parallel in the hardware, and then the polynomial can be calculated >using Horner rule. I.e. it is a good architecture for the hardware >implementation. >
Ok so if I understand right, this could yield to huge computation savings compared to method (2) of the original post. Basically, say the output rate is about 8 times faster than the ouptut rate, we know we will need to calculate about 8 different output samples for a given set of input samples. With this farrow structure, you could do the filtering only once on the input data, and then only need to do a few multiplies for the 8 different 'delay' value. Is this right? Also, would there be any advantages in the case where the output rate is lower than the input rate? gretz.
> >Vladimir Vassilevsky >DSP and Mixed Signal Design Consultant >http://www.abvolt.com >