DSPRelated.com
Forums

Naive FIR filter question

Started by Rick Lyons December 12, 2003


Hi Guys,

   I have a simple (I think) filtering question.
I've programmed (years ago) microcontrollers, 
but I've never programmed any of the modern DSP 
chips with their "zero-overhead looping" and 
single-cycle multiply-and-accumulate features.
(I know, I know.  I should get a DSP starter kit!)
So the DSP chips are efficient for implementing 
FIR filtering.

Am I correct in assuming that programmable DSP 
chips *cannot* take advantage of the "folded FIR 
structure" that's possible when a FIR filter's 
coefficients are symmetrical.  (You remember that 
"folded FIR" idea, ... where an M-tap FIR filter 
can be implemented with roughly M/2 multiplies 
per output sample.)

I've searched around on the Internet, but haven't 
found a solid answer to my question.

Thanks,
[-Rick-]


Rick Lyons wrote:
> Hi Guys,
[deletia]
> Am I correct in assuming that programmable DSP > chips *cannot* take advantage of the "folded FIR > structure" that's possible when a FIR filter's > coefficients are symmetrical. (You remember that > "folded FIR" idea, ... where an M-tap FIR filter > can be implemented with roughly M/2 multiplies > per output sample.)
Hi Rick, The folded structure still requires M-1 adds, so there are still M-1 cycles requiring arithmetic. Since DSPs can do the multiplies in parallel with the adds (and in a single cycle), there's no advantage to eliminating half the multiplies. I don't think execution would be any slower with the folded structure, but it's more difficult to write, and more difficult to read. Well - maybe it wouldn't be as fast if you count the more complicated loop setup. Perhaps the setup penalty could be eliminated too, but it would almost certainly cause brain damage to anyone embarking on such an adventure. -- Jim Thomas Principal Applications Engineer Bittware, Inc jthomas@bittware.com http://www.bittware.com (703) 779-7770 The secret to enjoying your job is to have a hobby that's even worse - Calvin's Dad
Rick Lyons wrote:

> Hi Guys, > > I have a simple (I think) filtering question. > I've programmed (years ago) microcontrollers, > but I've never programmed any of the modern DSP > chips with their "zero-overhead looping" and > single-cycle multiply-and-accumulate features. > (I know, I know. I should get a DSP starter kit!) > So the DSP chips are efficient for implementing > FIR filtering. > > Am I correct in assuming that programmable DSP > chips *cannot* take advantage of the "folded FIR > structure" that's possible when a FIR filter's > coefficients are symmetrical. (You remember that > "folded FIR" idea, ... where an M-tap FIR filter > can be implemented with roughly M/2 multiplies > per output sample.) > > I've searched around on the Internet, but haven't > found a solid answer to my question. > > Thanks, > [-Rick-]
You are correct, as usual. Folding amounts to adding pairs of coefficients before multiplying by a filter coefficient. It entails a little extra bookkeeping, but that's often trivial compared to replacing a multiply with an add. When multiplication and addition take the same time, the bookkeeping is a dead loss. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
"Jerry Avins" <jya@ieee.org> wrote in message
news:3fd9ea9c$0$14948$61fed72c@news.rcn.com...
> Rick Lyons wrote: > > > Hi Guys, > > > > I have a simple (I think) filtering question. > > I've programmed (years ago) microcontrollers, > > but I've never programmed any of the modern DSP > > chips with their "zero-overhead looping" and > > single-cycle multiply-and-accumulate features. > > (I know, I know. I should get a DSP starter kit!) > > So the DSP chips are efficient for implementing > > FIR filtering. > > > > Am I correct in assuming that programmable DSP > > chips *cannot* take advantage of the "folded FIR > > structure" that's possible when a FIR filter's > > coefficients are symmetrical. (You remember that > > "folded FIR" idea, ... where an M-tap FIR filter > > can be implemented with roughly M/2 multiplies > > per output sample.) > > > > I've searched around on the Internet, but haven't > > found a solid answer to my question. > > > > Thanks, > > [-Rick-] > > You are correct, as usual. Folding amounts to adding pairs of > coefficients before multiplying by a filter coefficient. It entails a > little extra bookkeeping, but that's often trivial compared to replacing > a multiply with an add. When multiplication and addition take the same > time, the bookkeeping is a dead loss.
"When multiplication and addition take the same time" which is almost always the case in modern DSP chips with their dedicated hardware multiplier.
Jon Harris wrote:

> "Jerry Avins" <jya@ieee.org> wrote in message > news:3fd9ea9c$0$14948$61fed72c@news.rcn.com... > >>Rick Lyons wrote:
...
>>>Am I correct in assuming that programmable DSP >>>chips *cannot* take advantage of the "folded FIR >>>structure" that's possible when a FIR filter's >>>coefficients are symmetrical. ...
>>You are correct, as usual. ...
>> ... When multiplication and addition take the same >>time, the bookkeeping is a dead loss. > > > "When multiplication and addition take the same time" which is almost always > the case in modern DSP chips with their dedicated hardware multiplier.
That's why I wrote that he was correct. :-) Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Hi all

You might want to look at the C54xx FIRS instruction.  It has the extra 
adder needed to impliment the following in 1 cycle

Acc = Acc + (*coef++ * (*mema++ + *memb++));

The data and coefficients do however need to be correctly placed in 
memory such that 3 reads can occur per cycle :-)

Best regards,
Keith Larson
=============================
Jim Thomas wrote:
> Rick Lyons wrote: > >> Hi Guys, > > [deletia] > >> Am I correct in assuming that programmable DSP chips *cannot* take >> advantage of the "folded FIR structure" that's possible when a FIR >> filter's coefficients are symmetrical. (You remember that "folded >> FIR" idea, ... where an M-tap FIR filter can be implemented with >> roughly M/2 multiplies per output sample.) > > > Hi Rick, > > The folded structure still requires M-1 adds, so there are still M-1 > cycles requiring arithmetic. Since DSPs can do the multiplies in > parallel with the adds (and in a single cycle), there's no advantage to > eliminating half the multiplies. > > I don't think execution would be any slower with the folded structure, > but it's more difficult to write, and more difficult to read. Well - > maybe it wouldn't be as fast if you count the more complicated loop > setup. Perhaps the setup penalty could be eliminated too, but it would > almost certainly cause brain damage to anyone embarking on such an > adventure.
+------------------------------------------+ |Keith Larson | |Member Group Technical Staff | |Texas Instruments Incorporated | | | | 281-274-3288 | | k-larson2@ti.com | |------------------------------------------+ | TMS320C3x/C4x/VC33 Applications | | | | $150 TMS320VC33 DSK's ARE AVAILABLE NOW | | | | TMS320VC33 | | The lowest cost and lowest power | | floating point DSP on the planet! | | 500uw/Mflop | +------------------------------------------+
Keith Larson wrote:

> Hi all > > You might want to look at the C54xx FIRS instruction. It has the extra > adder needed to impliment the following in 1 cycle > > Acc = Acc + (*coef++ * (*mema++ + *memb++)); > > The data and coefficients do however need to be correctly placed in > memory such that 3 reads can occur per cycle :-) > > Best regards, > Keith Larson > =============================
Oh, Wow! Now if we can get the second addend to be a subtrahend instead, We can fold Hilbert Transformers, too! Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Jim Thomas <jthomas@bittware.com> wrote in news:vtjqv631438883
@corp.supernews.com:

> Rick Lyons wrote: >> Hi Guys, > [deletia] >> Am I correct in assuming that programmable DSP >> chips *cannot* take advantage of the "folded FIR >> structure" that's possible when a FIR filter's >> coefficients are symmetrical. (You remember that >> "folded FIR" idea, ... where an M-tap FIR filter >> can be implemented with roughly M/2 multiplies >> per output sample.) > > Hi Rick, > > The folded structure still requires M-1 adds, so there are still M-1 > cycles requiring arithmetic. Since DSPs can do the multiplies in > parallel with the adds (and in a single cycle), there's no advantage to > eliminating half the multiplies. > > I don't think execution would be any slower with the folded structure, > but it's more difficult to write, and more difficult to read. Well - > maybe it wouldn't be as fast if you count the more complicated loop > setup. Perhaps the setup penalty could be eliminated too, but it would > almost certainly cause brain damage to anyone embarking on such an > adventure. >
I use a bit of a hybrid to this approach. Since FIRs are often symmetrical, I have a function that uses a coefficient file of length M/2 +1 (odd length M). I still do MACs on each element of the delay line. It takes just about the same number of cycles to execute but in cases where memory is in short supply and there are lots of filters, it saves memory. -- Al Clark Danville Signal Processing, Inc. -------------------------------------------------------------------- Purveyors of Fine DSP Hardware and other Cool Stuff Available at http://www.danvillesignal.com