Dear comp.dsp, I am trying to implement a Moorer reverberator in C. The implementation is fine, but I wanted to know how to optimize the initial tapped delay filter. That is, I have a an initial delay line (circular) buffer of 3520 samples, where I have 18 taps at various positions with various gains. Right now, I am doing the following in a loop: <code> er_buffer[er_index] = new_sample; #define CIRC_INDEX(x) ((ER_LENGTH + (x)) % ER_LENGTH) temp_buffer[i] = (ALPHA_TAP_1 * er_buffer[CIRC_INDEX(er_index + TAP_1)] + ALPHA_TAP_2 * er_buffer[CIRC_INDEX(er_index + TAP_2)] + ALPHA_TAP_3 * er_buffer[CIRC_INDEX(er_index + TAP_3)] ... + ALPHA_TAP_18 * er_buffer[CIRC_INDEX(er_index + TAP_18)]); er_index = (er_index + 1) % ER_LENGTH; </code> Obviously, this makes the code spend a *lot* of time on the division (modulo) operator. What is the correct and efficient way of doing this? I could find various suggestions on the internet, but couldn't find one which suited this appropriately. Thanks. Kumar
Optimizing a tapped delay (FIR) filter implementation
Started by ●June 29, 2006
Reply by ●June 29, 20062006-06-29
It looks like there is some jinx! Moments after I post a query on comp.dsp, I get the answer: http://www.dspguru.com/info/faqs/fir/implemnt.htm However, in the document, the idea proposed is doubling the length of one of the FIR data lines, and duplicating the buffer contents, so that the overflow is taken care of. Though this is very efficient when it comes to cycles, it eats huge memory. I was wondering whether there was a way to have the cake and eat it too, by striking a balance somewhere in between. Or, if you feel about 20 kilo bytes of memory duplicated is all right (this is for a PC, not a DSP), please tell me. Thanks. Kumar
Reply by ●June 29, 20062006-06-29
Kumar Appaiah wrote:> It looks like there is some jinx! Moments after I post a query on > comp.dsp, I get the answer: > > http://www.dspguru.com/info/faqs/fir/implemnt.htmThe most obvious (I actually came up with this method before the page loaded in my browser), simplest and most efficient method if you don't have hardware circular buffer to compute FIR filters is number 1 in the above list. No extra cost, no extra memory. Why don't you try it? Regards, Andor
Reply by ●June 29, 20062006-06-29
Kumar Appaiah wrote:> It looks like there is some jinx! Moments after I post a query on > comp.dsp, I get the answer: > > http://www.dspguru.com/info/faqs/fir/implemnt.htm > > However, in the document, the idea proposed is doubling the length of > one of the FIR data lines, and duplicating the buffer contents, so that > the overflow is taken care of. Though this is very efficient when it > comes to cycles, it eats huge memory. I was wondering whether there was > a way to have the cake and eat it too, by striking a balance somewhere > in between. > > Or, if you feel about 20 kilo bytes of memory duplicated is all right > (this is for a PC, not a DSP), please tell me. > > Thanks. > > KumarSame effect can be achieved by taking advantage of the mmu. Many OSs allow you to map two pages to same physical address. In the sense of used physical memory, this is free. -- Jani Huhtanen Tampere University of Technology, Pori
Reply by ●June 29, 20062006-06-29
Kumar Appaiah wrote:> It looks like there is some jinx! Moments after I post a query on > comp.dsp, I get the answer: > > http://www.dspguru.com/info/faqs/fir/implemnt.htm > > However, in the document, the idea proposed is doubling the length of > one of the FIR data lines, and duplicating the buffer contents, so that > the overflow is taken care of. Though this is very efficient when it > comes to cycles, it eats huge memory. I was wondering whether there was > a way to have the cake and eat it too, by striking a balance somewhere > in between. > > Or, if you feel about 20 kilo bytes of memory duplicated is all right > (this is for a PC, not a DSP), please tell me.How much memory do you have on your PC? 20 Kb is trivial by modern standards. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Reply by ●June 30, 20062006-06-30
Jani Huhtanen wrote:> Same effect can be achieved by taking advantage of the mmu. Many OSs allow > you to map two pages to same physical address. In the sense of used > physical memory, this is free.A quick search didn't yield much information. Could you just give me some pointers on how to do this? Portability isn't much of an issue, so I would like to learn about this. BTW, I am on GNU/Linux. Thanks. Kumar
Reply by ●June 30, 20062006-06-30
Andor wrote:> The most obvious (I actually came up with this method before the page > loaded in my browser), simplest and most efficient method if you don't > have hardware circular buffer to compute FIR filters is number 1 in the > above list. No extra cost, no extra memory.Well, it does double the memory use. Of course, 20 KB is trivial in PCs, I guess. And it runs about 7 times faster! Thanks. Kumar
Reply by ●June 30, 20062006-06-30
Kumar Appaiah wrote:> > Jani Huhtanen wrote: > >> Same effect can be achieved by taking advantage of the mmu. Many OSs >> allow you to map two pages to same physical address. In the sense of used >> physical memory, this is free. > > A quick search didn't yield much information. Could you just give me > some pointers on how to do this? Portability isn't much of an issue, so > I would like to learn about this. > > BTW, I am on GNU/Linux. > > Thanks. > > KumarSearch for 'ring buffer' and 'mmap': http://www.google.com/search?q=ring+buffer+mmap&ie=UTF-8&oe=UTF-8 One promising result: http://freshmeat.net/projects/vrb/ And I believe this can be done on any OS that supports some sort of shared memory. That is, pretty much on every modern OS. I have myself implemented such ring buffer for linux (or any posix compliant OS) as a C++ class. However, the code is not in such a shape that I feel comfortable sharing it. If you have problems with the implementation I can help with the details. -- Jani Huhtanen Tampere University of Technology, Pori
Reply by ●June 30, 20062006-06-30
Kumar Appaiah wrote:> Andor wrote: > > > The most obvious (I actually came up with this method before the page > > loaded in my browser), simplest and most efficient method if you don't > > have hardware circular buffer to compute FIR filters is number 1 in the > > above list. No extra cost, no extra memory. > > Well, it does double the memory use. Of course, 20 KB is trivial in > PCs, I guess.You must be doing something wrong - the amount of memory required does not increase! All you do is split the computation into two parts: the accumulation before the circular buffer wrap-around, and the accumulation after the wrap-around. You have to compute the wrap around point beforehand, then use two for loops, one starting from the current read/write position going to the wrap-around point, the other starting from zero to the current position -1.> And it runs about 7 times faster!Using the above method, you only need to compute one modulo operation per filter cycle and have no additional memory requirement. Regards, Andor
Reply by ●June 30, 20062006-06-30
Kumar Appaiah wrote:> ... I have a an initial delay line > (circular) buffer of 3520 samples, where I have 18 taps > > at various positions with various gains. Right now, I am doing the > following in a loop:...> Obviously, this makes the code spend a *lot* of time on the division > (modulo) operator. What is the correct and efficient waybump the size of the circular array up to a power of 2 (in this case 4096) and then use the bit-wise & operator to mask (er_index+TAP_2) with (ER_LENGTH-1). if your compiler is dumb and subtracts the 1 each time rather than uses a constant value of (ER_LENGTH-1), then #define a new symbol. the & operator is much faster than the % operator. #define ER_LENGTH 4096 % must be power of 2 #define CIRC_INDEX(x) ( (x) & (ER_LENGTH-1) ) r b-j






