Forums

Convert cascade to parallel

Started by Jon Harris February 23, 2004
I have implemented various IIR filters, up to 16th order, as a series of
biquads in cascade.  However, I now am attempting to optimize this code to
run on a SIMD processor (SHARC 21161).  For that purpose, it appears that a
parallel filter form would be much better suited since it would allow
working on 2 paths in parallel.

Is there any simple way to convert my cascade coefficients to parallel
coefficients?  I need to do this in "real-time" as my filters can be
arbitrarily adjusted by the user in real-time.  The literature seems to use
partial fraction expansion to do this, but that doesn't seem too friendly to
a real-time implementation.

If anyone has any other advice on implementing IIR biquads to take advantage
of a SIMD architecture, that would be appreciated as well.

FYI, the application is audio.

-Jon


Jon Harris wrote:
> I have implemented various IIR filters, up to 16th order, as a series of > biquads in cascade. However, I now am attempting to optimize this code to > run on a SIMD processor (SHARC 21161). For that purpose, it appears that a > parallel filter form would be much better suited since it would allow > working on 2 paths in parallel.
Hi Jon, my newsreader shows no answer to your post. Just to keep the track record of comp.dsp up, I'll give it a shot :). I disagree with your assumption, that the best way to utilize SIMD for IIR filters is to parallelize the filters. This messes up your filter architecture (bad, since you probably use one optimized for minimum noise or minimum instructions). Rather, I would suggest the following approaches: i) If it is possible, process two (or mutliples of two) channels with the same processor - this is the easiest way to utilize SIMD. ii) Write your own IIR biquad routine using SIMD (you should be able to cut down from 4 to 2 instructions, or perhaps 3, I haven't done this) - or maybe you can download optimized SIMD biquad code from ADI. iii) By using additional delays (one between each biquad), you can actually use SIMD to calculate two biquads simultaneously with the standard IIR biquad code (just load the input to each biquad from the delay). Regards, Andor
Thanks for your reply, Andor.  Indeed, it has been the only one.  Your
suggestions are good ones, though some are not practical for my application.
A few comments:

i) My application calls for both "mono" and "stereo" filters.  As you can
imagine, the stereo filters were very easy to convert to SIMD and yield an
excellent performance increase.  However, I am now trying to deal with the
mono ones.  A possibility would be to offer an additional "dual mono" filter
that allows different filtering on each channel.  (The existing stereo
module performs identical filtering on 2 channels.)  Unfortunately, this
requires quite a bit more work than just re-writing the DSP code and users
would need to be savvy enough to take advantage of it.

ii) I've looked into this a bit, but there doesn't seem to be any easy
solution.  I'm using the "normalized 4-multiply lattice/ladder" form (not
Direct Form), and so far I haven't found any way to effectively "SIMD-ize"
it.  The ideas I have come up with would only save a few cycles per band but
would also cause increased overhead to the point of not making it
worthwhile.

iii) That may be the most feasible.  While latency/delay is an issue in our
system, we have an existing mechanism in place to compensate for it.  Am I
right in supposing that if I processed biquads in pairs, I would only occur
an over-all 1-sample delay even if I had dozens of biquads?  For example,
with a 20-band filter, the even bands from "now" and the odd bands from "one
sample ago" would be processed in parallel.  Make sense?  I would still need
to deal with the special case of odd numbers of bands, but that is solvable.

"Andor Bariska" <andor@nospam.net> wrote in message
news:4044a37d$1@pfaff2.ethz.ch...
> Jon Harris wrote: > > I have implemented various IIR filters, up to 16th order, as a series of > > biquads in cascade. However, I now am attempting to optimize this code
to
> > run on a SIMD processor (SHARC 21161). For that purpose, it appears
that a
> > parallel filter form would be much better suited since it would allow > > working on 2 paths in parallel. > > Hi Jon, > > my newsreader shows no answer to your post. Just to keep the track > record of comp.dsp up, I'll give it a shot :). > > I disagree with your assumption, that the best way to utilize SIMD for > IIR filters is to parallelize the filters. This messes up your filter > architecture (bad, since you probably use one optimized for minimum > noise or minimum instructions). Rather, I would suggest the following > approaches: > > i) If it is possible, process two (or mutliples of two) channels with > the same processor - this is the easiest way to utilize SIMD. > > ii) Write your own IIR biquad routine using SIMD (you should be able to > cut down from 4 to 2 instructions, or perhaps 3, I haven't done this) - > or maybe you can download optimized SIMD biquad code from ADI. > > iii) By using additional delays (one between each biquad), you can > actually use SIMD to calculate two biquads simultaneously with the > standard IIR biquad code (just load the input to each biquad from the > delay). > > Regards, > Andor >
Jon Harris wrote:
...
> iii) That may be the most feasible. While latency/delay is an issue in our > system, we have an existing mechanism in place to compensate for it. Am I > right in supposing that if I processed biquads in pairs, I would only occur > an over-all 1-sample delay even if I had dozens of biquads? For example, > with a 20-band filter, the even bands from "now" and the odd bands from "one > sample ago" would be processed in parallel. Make sense? I would still need > to deal with the special case of odd numbers of bands, but that is solvable.
Yes, makes perfect sense. Just split up the twenty filters into two times ten filters. One delay is enough (right in the middle of the even numbered biquads). For an odd number of filters, just add a "bypass" filter to the end to make the number even (using SIMD, you need the same number of instructions to process either 2n-1 or 2n biquads, for all positive n, so adding another biquad won't incur any overhead). Regards, Andor
"Andor" <an2or@mailcircuit.com> wrote in message
news:ce45f9ed.0403032258.51c7db8d@posting.google.com...
> Jon Harris wrote: > ... > > iii) That may be the most feasible. While latency/delay is an issue in
our
> > system, we have an existing mechanism in place to compensate for it. Am
I
> > right in supposing that if I processed biquads in pairs, I would only
occur
> > an over-all 1-sample delay even if I had dozens of biquads? For example, > > with a 20-band filter, the even bands from "now" and the odd bands from
"one
> > sample ago" would be processed in parallel. Make sense? I would still
need
> > to deal with the special case of odd numbers of bands, but that is
solvable.
> > Yes, makes perfect sense. Just split up the twenty filters into two > times ten filters. One delay is enough (right in the middle of the > even numbered biquads). For an odd number of filters, just add a > "bypass" filter to the end to make the number even (using SIMD, you > need the same number of instructions to process either 2n-1 or 2n > biquads, for all positive n, so adding another biquad won't incur any > overhead).
Thanks Andor. Assuming I can deal with the increased latency, that sounds like the best bet by far.