Forums

Sample rate conversion, Multistage and polyphase?

Started by baeksan October 20, 2008
Hi,

I'm working on a audio sample rate converter.  I'm interested particularly
in the conversion from 44.1 to 48kHz.  I have a version that works using
polyphase decomposition for a huge fir filter to convert from 44.1 to 48. 
I was trying to do this in a multistage cascade, but it seems to be taking
up more cpu than just doing a one stage.  

Is there an algorithm or block diagram someone has a pointer to, where
multistage can be combined with polyphase decomposition when sample rate
converting?  

The way I am doing it currently is running my polyphase decomposition
algorithm for each stage, looping over the number of samples however many
stages there are.  I'm sure there is a way to combine the different stages
with the polyphase decomposition, while iterating over the number of
samples only once, but I can't seem to get my head around it.

Thanks!
On Oct 20, 4:59&#2013266080;pm, "baeksan" <breakcha...@yahoo.com> wrote:
> Hi, > > I'm working on a audio sample rate converter. &#2013266080;I'm interested particularly > in the conversion from 44.1 to 48kHz. &#2013266080;I have a version that works using > polyphase decomposition for a huge fir filter to convert from 44.1 to 48. > I was trying to do this in a multistage cascade, but it seems to be taking > up more cpu than just doing a one stage. &#2013266080; > > Is there an algorithm or block diagram someone has a pointer to, where > multistage can be combined with polyphase decomposition when sample rate > converting? &#2013266080; > > The way I am doing it currently is running my polyphase decomposition > algorithm for each stage, looping over the number of samples however many > stages there are. &#2013266080;I'm sure there is a way to combine the different stages > with the polyphase decomposition, while iterating over the number of > samples only once, but I can't seem to get my head around it. > > Thanks!
Think about writing a function that is a simple interpolator/ decimator. For example a 3:2 interpolator/decimator program would interpolate by a factor of three and then decimate by a factor of two. The trick is to count through the phases and know where the zeros are, so you won't waste time computing with zeroes. Once you have such a routine written then you can simply cascade multiples of these with different/similar interpolation/decimation ratios to arrive at essentially any rational transformation you need. The other part is to design your FIR filter knowing where the overlap and don't car regions fall. I did this years ago and it worked quite well. I even had the code take the impulse response and break it up into its various phases. IHTH, Clay
>Think about writing a function that is a simple interpolator/ >decimator. For example a 3:2 interpolator/decimator program would >interpolate by a factor of three and then decimate by a factor of two. >The trick is to count through the phases and know where the zeros are, >so you won't waste time computing with zeroes. Once you have such a >routine written then you can simply cascade multiples of these with >different/similar interpolation/decimation ratios to arrive at >essentially any rational transformation you need. The other part is to >design your FIR filter knowing where the overlap and don't car regions >fall. I did this years ago and it worked quite well. I even had the >code take the impulse response and break it up into its various >phases. > >IHTH, > >Clay
Clay, Thanks for your reply. The problem I am having is that doing cascades seems to take more cpu than just doing one stage, I think due to looping through each sample multiple times. For example, doing the following multistage cascade, 2:1, then 80:147, seems to take more cpu than just doing 160:147 sample rate conversion? Is there a way to do the multistages on one function call, not having to iterate through each sample multiple times?
On Oct 20, 7:10&#2013266080;pm, "baeksan" <breakcha...@yahoo.com> wrote:
> >Think about writing a function that is a simple interpolator/ > >decimator. For example a 3:2 interpolator/decimator program would > >interpolate by a factor of three and then decimate by a factor of two. > >The trick is to count through the phases and know where the zeros are, > >so you won't waste time computing with zeroes. Once you have such a > >routine written then you can simply cascade multiples of these with > >different/similar interpolation/decimation ratios to arrive at > >essentially any rational transformation you need. The other part is to > >design your FIR filter knowing where the overlap and don't car regions > >fall. &#2013266080;I did this years ago and it worked quite well. I even had the > >code take the impulse response and break it up into its various > >phases. > > >IHTH, > > >Clay > > Clay, > > Thanks for your reply. &#2013266080;The problem I am having is that doing cascades > seems to take more cpu than just doing one stage, I think due to looping > through each sample multiple times. &#2013266080;For example, doing the following > multistage cascade, 2:1, then 80:147, seems to take more cpu than just > doing 160:147 sample rate conversion? &#2013266080;Is there a way to do the multistages > on one function call, not having to iterate through each sample multiple > times?- Hide quoted text - > > - Show quoted text -
I think a three or four stage approach would be your best bet for a 44.1 to 48 conversion. Euclid's method reduces this to 160:147 (reversed order to state as interpolation/decimation ratio) as you already know. And then prime factoring these gives (2*2*2*2*2*5): (3*7*7). So you could try (3:2),(7:4),(7:4), and finally (1/5) for a possible set of interpolation/decimation ratios. Hopefully you are only trying to keep something like 0-20kHz and not get close to 22.05kHz. Also when you design your filter (only one for each interpolator/ decimator stage), you know that the filter isn't designed as a strict lowpass filter. The folding over of the spectra creates don't care regions. So by not forcing your lowpass filter to specifically stop everything above a certain frequency but rather have it just stop certain bands (ones that map on top of the desired passband) above a critical frequency, will allow your filter to have fewer taps and therefore save resources. Also The cascade structure basically has you call the output put stage which in turn calls the stage before it as necessary to feed it. The input stage will request input data from the source as needed. As I stated before I've used this approach quite sucessfully for converting various rates common in computer audio. The motivation behind the factoring of the ratios into ones with small prime numbers is high interpolation and decimation ratios force sharp transitions in the interpolation filter's response which makes for very long (high numberof taps) filters. So the primes allow you to use the lowest possible ratios. IHTH, Clay p.s. Maybe I need to write a paper giving the details of this approach along with c code actually implementing it. I already have the code written.
Thanks for the reply.  I will try out 4 stages and see if it improves. I
have a decent understanding about the design of the lowpass filter's for
each cascade, I'm referencing Crochiere and Rabiner for this.  I guess I
just need to analyze the cascade structure more in depth to actually write
the c code for implementation.  Theory vs actual implementation seems to be
a whole different ball game.  If you can share the c code, that would be
great.  Or just some pseudo code would be very helpful, esp for the
cascaded stages.

Thanks

>I think a three or four stage approach would be your best bet for a >44.1 to 48 conversion. Euclid's method reduces this to 160:147 >(reversed order to state as interpolation/decimation ratio) as you >already know. And then prime factoring these gives (2*2*2*2*2*5): >(3*7*7). So you could try (3:2),(7:4),(7:4), and finally (1/5) for a >possible set of interpolation/decimation ratios. Hopefully you are >only trying to keep something like 0-20kHz and not get close to >22.05kHz. > >Also when you design your filter (only one for each interpolator/ >decimator stage), you know that the filter isn't designed as a strict >lowpass filter. The folding over of the spectra creates don't care >regions. So by not forcing your lowpass filter to specifically stop >everything above a certain frequency but rather have it just stop >certain bands (ones that map on top of the desired passband) above a >critical frequency, will allow your filter to have fewer taps and >therefore save resources. > >Also The cascade structure basically has you call the output put stage >which in turn calls the stage before it as necessary to feed it. The >input stage will request input data from the source as needed. As I >stated before I've used this approach quite sucessfully for converting >various rates common in computer audio. > >The motivation behind the factoring of the ratios into ones with small >prime numbers is high interpolation and decimation ratios force sharp >transitions in the interpolation filter's response which makes for >very long (high numberof taps) filters. So the primes allow you to use >the lowest possible ratios. > >IHTH, > >Clay > > >p.s. Maybe I need to write a paper giving the details of this approach >along with c code actually implementing it. I already have the code >written. > > > > > > >
On Oct 21, 9:39 am, "baeksan" <breakcha...@yahoo.com> wrote:
> ... > I will try out 4 stages and see if it improves. > ...
I'd suggest 8:7; 4:3; 5:7 to keep the maximum intermediate sampling frequency down.
> ... > Theory vs actual implementation seems to be > a whole different ball game. > ...
You mention cpu time and c code. If you are measuring times on a desktop computer, you may need to try to pick data structure sizes to allow the entire chain to fit in cache. Big data structures that force write back to main memory after each stage and reading to cache going into each stage can slow the processing considerably. Dale B. Dalrymple http://dbdimages.com
>You mention cpu time and c code. If you are measuring times on a >desktop computer, you may need to try to pick data structure sizes to >allow the entire chain to fit in cache. Big data structures that force >write back to main memory after each stage and reading to cache going >into each stage can slow the processing considerably. > >Dale B. Dalrymple >http://dbdimages.com >
To fit in the cache, I should use aligned data that can fit into all the registers correct? So in my function call, I should try to only declare varibles/pointers that will fit into the number of allotted registers? Thanks
On Oct 22, 10:45 am, "baeksan" <breakcha...@yahoo.com> wrote:
> >You mention cpu time and c code. If you are measuring times on a > >desktop computer, you may need to try to pick data structure sizes to > >allow the entire chain to fit in cache. Big data structures that force > >write back to main memory after each stage and reading to cache going > >into each stage can slow the processing considerably. > > >Dale B. Dalrymple > >http://dbdimages.com > > To fit in the cache, I should use aligned data that can fit into all the > registers correct? So in my function call, I should try to only declare > varibles/pointers that will fit into the number of allotted registers? > > Thanks
Cache is an intermediate memory between the CPU and it's registers and the slower main memory. I suggest you read a cache application note to see a detailed description. For example: http://focus.ti.com.cn/cn/lit/an/spra756/spra756.pdf I don't suggest this CPU is the one you use or should use, but the concepts and issues of cache use are similar. The details are a function of the processor your code runs on and it's cache size. The operating system and compiler may also affect what you can do to assure in cache execution. Dale B. Dalrymple