I have implemented various IIR filters, up to 16th order, as a series of biquads in cascade. However, I now am attempting to optimize this code to run on a SIMD processor (SHARC 21161). For that purpose, it appears that a parallel filter form would be much better suited since it would allow working on 2 paths in parallel. Is there any simple way to convert my cascade coefficients to parallel coefficients? I need to do this in "real-time" as my filters can be arbitrarily adjusted by the user in real-time. The literature seems to use partial fraction expansion to do this, but that doesn't seem too friendly to a real-time implementation. If anyone has any other advice on implementing IIR biquads to take advantage of a SIMD architecture, that would be appreciated as well. FYI, the application is audio. -Jon
Convert cascade to parallel
Started by ●February 23, 2004
Reply by ●March 2, 20042004-03-02
Jon Harris wrote:> I have implemented various IIR filters, up to 16th order, as a series of > biquads in cascade. However, I now am attempting to optimize this code to > run on a SIMD processor (SHARC 21161). For that purpose, it appears that a > parallel filter form would be much better suited since it would allow > working on 2 paths in parallel.Hi Jon, my newsreader shows no answer to your post. Just to keep the track record of comp.dsp up, I'll give it a shot :). I disagree with your assumption, that the best way to utilize SIMD for IIR filters is to parallelize the filters. This messes up your filter architecture (bad, since you probably use one optimized for minimum noise or minimum instructions). Rather, I would suggest the following approaches: i) If it is possible, process two (or mutliples of two) channels with the same processor - this is the easiest way to utilize SIMD. ii) Write your own IIR biquad routine using SIMD (you should be able to cut down from 4 to 2 instructions, or perhaps 3, I haven't done this) - or maybe you can download optimized SIMD biquad code from ADI. iii) By using additional delays (one between each biquad), you can actually use SIMD to calculate two biquads simultaneously with the standard IIR biquad code (just load the input to each biquad from the delay). Regards, Andor
Reply by ●March 3, 20042004-03-03
Thanks for your reply, Andor. Indeed, it has been the only one. Your suggestions are good ones, though some are not practical for my application. A few comments: i) My application calls for both "mono" and "stereo" filters. As you can imagine, the stereo filters were very easy to convert to SIMD and yield an excellent performance increase. However, I am now trying to deal with the mono ones. A possibility would be to offer an additional "dual mono" filter that allows different filtering on each channel. (The existing stereo module performs identical filtering on 2 channels.) Unfortunately, this requires quite a bit more work than just re-writing the DSP code and users would need to be savvy enough to take advantage of it. ii) I've looked into this a bit, but there doesn't seem to be any easy solution. I'm using the "normalized 4-multiply lattice/ladder" form (not Direct Form), and so far I haven't found any way to effectively "SIMD-ize" it. The ideas I have come up with would only save a few cycles per band but would also cause increased overhead to the point of not making it worthwhile. iii) That may be the most feasible. While latency/delay is an issue in our system, we have an existing mechanism in place to compensate for it. Am I right in supposing that if I processed biquads in pairs, I would only occur an over-all 1-sample delay even if I had dozens of biquads? For example, with a 20-band filter, the even bands from "now" and the odd bands from "one sample ago" would be processed in parallel. Make sense? I would still need to deal with the special case of odd numbers of bands, but that is solvable. "Andor Bariska" <andor@nospam.net> wrote in message news:4044a37d$1@pfaff2.ethz.ch...> Jon Harris wrote: > > I have implemented various IIR filters, up to 16th order, as a series of > > biquads in cascade. However, I now am attempting to optimize this codeto> > run on a SIMD processor (SHARC 21161). For that purpose, it appearsthat a> > parallel filter form would be much better suited since it would allow > > working on 2 paths in parallel. > > Hi Jon, > > my newsreader shows no answer to your post. Just to keep the track > record of comp.dsp up, I'll give it a shot :). > > I disagree with your assumption, that the best way to utilize SIMD for > IIR filters is to parallelize the filters. This messes up your filter > architecture (bad, since you probably use one optimized for minimum > noise or minimum instructions). Rather, I would suggest the following > approaches: > > i) If it is possible, process two (or mutliples of two) channels with > the same processor - this is the easiest way to utilize SIMD. > > ii) Write your own IIR biquad routine using SIMD (you should be able to > cut down from 4 to 2 instructions, or perhaps 3, I haven't done this) - > or maybe you can download optimized SIMD biquad code from ADI. > > iii) By using additional delays (one between each biquad), you can > actually use SIMD to calculate two biquads simultaneously with the > standard IIR biquad code (just load the input to each biquad from the > delay). > > Regards, > Andor >
Reply by ●March 4, 20042004-03-04
Jon Harris wrote: ...> iii) That may be the most feasible. While latency/delay is an issue in our > system, we have an existing mechanism in place to compensate for it. Am I > right in supposing that if I processed biquads in pairs, I would only occur > an over-all 1-sample delay even if I had dozens of biquads? For example, > with a 20-band filter, the even bands from "now" and the odd bands from "one > sample ago" would be processed in parallel. Make sense? I would still need > to deal with the special case of odd numbers of bands, but that is solvable.Yes, makes perfect sense. Just split up the twenty filters into two times ten filters. One delay is enough (right in the middle of the even numbered biquads). For an odd number of filters, just add a "bypass" filter to the end to make the number even (using SIMD, you need the same number of instructions to process either 2n-1 or 2n biquads, for all positive n, so adding another biquad won't incur any overhead). Regards, Andor
Reply by ●March 8, 20042004-03-08
"Andor" <an2or@mailcircuit.com> wrote in message news:ce45f9ed.0403032258.51c7db8d@posting.google.com...> Jon Harris wrote: > ... > > iii) That may be the most feasible. While latency/delay is an issue inour> > system, we have an existing mechanism in place to compensate for it. AmI> > right in supposing that if I processed biquads in pairs, I would onlyoccur> > an over-all 1-sample delay even if I had dozens of biquads? For example, > > with a 20-band filter, the even bands from "now" and the odd bands from"one> > sample ago" would be processed in parallel. Make sense? I would stillneed> > to deal with the special case of odd numbers of bands, but that issolvable.> > Yes, makes perfect sense. Just split up the twenty filters into two > times ten filters. One delay is enough (right in the middle of the > even numbered biquads). For an odd number of filters, just add a > "bypass" filter to the end to make the number even (using SIMD, you > need the same number of instructions to process either 2n-1 or 2n > biquads, for all positive n, so adding another biquad won't incur any > overhead).Thanks Andor. Assuming I can deal with the increased latency, that sounds like the best bet by far.