DSPRelated.com
Forums

Cascading FIR

Started by Vasilius September 25, 2016
Hello

I have two DSP connected in serial mode.
Have FIR with 2000 coefficients .
Need to recalc and divide coefficients of FIR by 1000 for 2 DSP, so that
in serial application give result of FIR with 2000 points.
Delay of FIR here dont work, bcs FIR of each DSP is 1000 points maximum. 


Is this possible?

WBR, Vasilius.


---------------------------------------
Posted through http://www.DSPRelated.com
On 25.09.2016 17:47, Vasilius wrote:
> Hello > > I have two DSP connected in serial mode. > Have FIR with 2000 coefficients . > Need to recalc and divide coefficients of FIR by 1000 for 2 DSP, so that > in serial application give result of FIR with 2000 points. > Delay of FIR here dont work, bcs FIR of each DSP is 1000 points maximum. > > > Is this possible? > > WBR, Vasilius. > > > --------------------------------------- > Posted through http://www.DSPRelated.com >
Assuming you have H(z) = sum (in k from 1 to 2000) a_k * z^(-k), you can split it in multiple ways. The simplest would be H(z) = H1(z) + H2(z)*z^(-1000), where H1(z) = sum (in k from 1 to 1000) a_k * z^(-k), and H2(z) = sum (in k from 1 to 1000) a_{k+1000} * z^(-k) It's not exactly "serial", but you just need to add the output of filter H1 with the delayed output of filter H2. I'm afraid to ask why do you need a FIR with 2000 coefficients... Gene
On 25.09.2016 18:50, Evgeny Filatov wrote:
> On 25.09.2016 17:47, Vasilius wrote: >> Hello >> >> I have two DSP connected in serial mode. >> Have FIR with 2000 coefficients . >> Need to recalc and divide coefficients of FIR by 1000 for 2 DSP, so that >> in serial application give result of FIR with 2000 points. >> Delay of FIR here dont work, bcs FIR of each DSP is 1000 points maximum. >> >> >> Is this possible? >> >> WBR, Vasilius. >> >> >> --------------------------------------- >> Posted through http://www.DSPRelated.com >> > > Assuming you have H(z) = sum (in k from 1 to 2000) a_k * z^(-k), you can > split it in multiple ways. The simplest would be > H(z) = H1(z) + H2(z)*z^(-1000), where > H1(z) = sum (in k from 1 to 1000) a_k * z^(-k), and > H2(z) = sum (in k from 1 to 1000) a_{k+1000} * z^(-k) > > It's not exactly "serial", but you just need to add the output of filter > H1 with the delayed output of filter H2. > > I'm afraid to ask why do you need a FIR with 2000 coefficients... > > Gene >
Perhaps I have misunderstood your question. If you have H(z) = sum (in k from 0 to 1999) a_k * z^(-k), and you want to partition it as H(z) = H1(z) * H2(z), where H1(z) = sum (in k from 0 to 1000) b_k * z^(-k), and H2(z) = sum (in k from 0 to 999) c_k * z^(-k). Where you know coefficients a_k, and want to determine coefficients b_k and c_k. Then it's a pretty straightforward task! According to the fundamental theorem of algebra the polynomial H(z) can be partitioned as a multiplication of first-order complex polynomials, or second-order real polynomials. So you just need to find the pairs of complex zeros of H(z), and then split them in any convenient way to generate two polynomials of less order. There would be lots of ways to do that, so perhaps you'd like to choose some which minimize largest values of coefficients in the resulting filters. However, the question was whether it's possible or not, and yes, it is possible. HTH, Gene
On 25.09.2016 19:24, Evgeny Filatov wrote:
> On 25.09.2016 18:50, Evgeny Filatov wrote: >> On 25.09.2016 17:47, Vasilius wrote: >>> Hello >>> >>> I have two DSP connected in serial mode. >>> Have FIR with 2000 coefficients . >>> Need to recalc and divide coefficients of FIR by 1000 for 2 DSP, so that >>> in serial application give result of FIR with 2000 points. >>> Delay of FIR here dont work, bcs FIR of each DSP is 1000 points maximum. >>> >>> >>> Is this possible? >>> >>> WBR, Vasilius. >>> >>> >>> --------------------------------------- >>> Posted through http://www.DSPRelated.com >>> >> >> Assuming you have H(z) = sum (in k from 1 to 2000) a_k * z^(-k), you can >> split it in multiple ways. The simplest would be >> H(z) = H1(z) + H2(z)*z^(-1000), where >> H1(z) = sum (in k from 1 to 1000) a_k * z^(-k), and >> H2(z) = sum (in k from 1 to 1000) a_{k+1000} * z^(-k) >> >> It's not exactly "serial", but you just need to add the output of filter >> H1 with the delayed output of filter H2. >> >> I'm afraid to ask why do you need a FIR with 2000 coefficients... >> >> Gene >> > > Perhaps I have misunderstood your question. If you have H(z) = sum (in k > from 0 to 1999) a_k * z^(-k), and you want to partition it as > H(z) = H1(z) * H2(z), where > H1(z) = sum (in k from 0 to 1000) b_k * z^(-k), and > H2(z) = sum (in k from 0 to 999) c_k * z^(-k). > Where you know coefficients a_k, and want to determine coefficients b_k > and c_k. > > Then it's a pretty straightforward task! According to the fundamental > theorem of algebra the polynomial H(z) can be partitioned as a > multiplication of first-order complex polynomials, or second-order real > polynomials. So you just need to find the pairs of complex zeros of > H(z), and then split them in any convenient way to generate two > polynomials of less order. > > There would be lots of ways to do that, so perhaps you'd like to choose > some which minimize largest values of coefficients in the resulting > filters. > > However, the question was whether it's possible or not, and yes, it is > possible. > > HTH, > Gene >
Or perhaps (if it's some standard case like low-pass or band-pass) you'd have more luck if you go to the frequency domain, take a square root of the frequency response, and try to design a lesser filter with 1000 coefficients satisfying the lesser requirements of a square-root response. Gene
On Sun, 25 Sep 2016 09:47:29 -0500, "Vasilius" <116196@DSPRelated>
wrote:

>Hello > >I have two DSP connected in serial mode. >Have FIR with 2000 coefficients . >Need to recalc and divide coefficients of FIR by 1000 for 2 DSP, so that >in serial application give result of FIR with 2000 points. >Delay of FIR here dont work, bcs FIR of each DSP is 1000 points maximum. > > >Is this possible?
Is the goal really to just spread the processing across two DSPs, or is there some contraint that requires the filter computation be split up like you're describing? Are the filter coefficients symmetric? Depending on the answers to the above, there are still lots of ways to split a long filter between two DSPs. Some of it depends on what communication capabilities there are between the two DSPs, but if you have a blank slate then a lot is possible. 1. I think the most direct answer to your question is to split the partial product calculations between the two processors. In other words, the sum of the products of the input with the first 1000 coeffiicents is computed in one, and the second 1000 in the other, and, somewhere those two partial products get added together to to form the result. This is definitely do-able and seems to most directly answer your question if I understand it correctly. It may not be the "best" way to do it depending on the system, though. 2. Calculate every other output of the entire filter in each DSP. e.g., calculate the even outputs in one and the odd outputs in the other. This is a 50% load reduction in each processor compared to using one DSP. 3. If the coefficients are symmetric, there is a very common trick to fold the filter and achieve a 50% reduction in the number of multiplies required. If you still need to split load across two processors, either of the tricks in 1 or 2 will still work on the folded filter. There are lots more variations on the above, but I hope you get the idea.
>WBR, Vasilius. > > >--------------------------------------- >Posted through http://www.DSPRelated.com
On 25.09.2016 21:27, eric.jacobsen@ieee.org wrote:
> > 2. Calculate every other output of the entire filter in each DSP. > e.g., calculate the even outputs in one and the odd outputs in the > other. This is a 50% load reduction in each processor compared to > using one DSP. >
I'm afraid polyphase partitioning only works that way in case (in the original design) filtering is followed by a decimation by two. If there's no decimation, that would be still a couple of 2000-coefficient FIR filters each with a half of coefficients equal to zero. Gene
On Sun, 25 Sep 2016 21:43:10 +0300, Evgeny Filatov
<filatov.ev@mipt.ru> wrote:

>On 25.09.2016 21:27, eric.jacobsen@ieee.org wrote: >> >> 2. Calculate every other output of the entire filter in each DSP. >> e.g., calculate the even outputs in one and the odd outputs in the >> other. This is a 50% load reduction in each processor compared to >> using one DSP. >> > >I'm afraid polyphase partitioning only works that way in case (in the >original design) filtering is followed by a decimation by two. If >there's no decimation, that would be still a couple of 2000-coefficient >FIR filters each with a half of coefficients equal to zero.
I think you misunderstood somehow. Each filter computes half the output samples. The full stream is recovered by interleaving the output of both processors back to the full stream. This is easily done in a FIR by advancing the samples two stages through the delay line between output samples rather than one, which is a pretty simple thing to do in software.
On 25.09.2016 23:47, eric.jacobsen@ieee.org wrote:
> On Sun, 25 Sep 2016 21:43:10 +0300, Evgeny Filatov > <filatov.ev@mipt.ru> wrote: > >> On 25.09.2016 21:27, eric.jacobsen@ieee.org wrote: >>> >>> 2. Calculate every other output of the entire filter in each DSP. >>> e.g., calculate the even outputs in one and the odd outputs in the >>> other. This is a 50% load reduction in each processor compared to >>> using one DSP. >>> >> >> I'm afraid polyphase partitioning only works that way in case (in the >> original design) filtering is followed by a decimation by two. If >> there's no decimation, that would be still a couple of 2000-coefficient >> FIR filters each with a half of coefficients equal to zero. > > I think you misunderstood somehow. Each filter computes half the > output samples. The full stream is recovered by interleaving the > output of both processors back to the full stream. > > This is easily done in a FIR by advancing the samples two stages > through the delay line between output samples rather than one, which > is a pretty simple thing to do in software. > >
Thanks for clarification, Eric. I didn't have any idea about whether the OP's FIRs are software or hardware. Gene
If you use the normal forward FIR structure, you have to send the delayed-by-1000 input data as well as the partial sum-of-products from dsp1 to dsp2, unless you don't care about wasting memory in dsp2. However if you use the transposed form of FIR filtering then you only have to send one number across the boundary. 

Another consideration is that in fixed-point math, the accumulator often overflows or under-flows many times during the sequence of multiply-accumulates but as long as it comes back within range at the end, you don't care. So when you break the FIR in two, you may need to transmit all the bits of the accumulator from dsp1 to dsp2, which might require multiple data transactions, assuming the dsp allows all bits of the accumulator to be sent to the serial port, which is not always the case. 

Bob
On Sun, 25 Sep 2016 23:58:18 +0300, Evgeny Filatov
<filatov.ev@mipt.ru> wrote:

>On 25.09.2016 23:47, eric.jacobsen@ieee.org wrote: >> On Sun, 25 Sep 2016 21:43:10 +0300, Evgeny Filatov >> <filatov.ev@mipt.ru> wrote: >> >>> On 25.09.2016 21:27, eric.jacobsen@ieee.org wrote: >>>> >>>> 2. Calculate every other output of the entire filter in each DSP. >>>> e.g., calculate the even outputs in one and the odd outputs in the >>>> other. This is a 50% load reduction in each processor compared to >>>> using one DSP. >>>> >>> >>> I'm afraid polyphase partitioning only works that way in case (in the >>> original design) filtering is followed by a decimation by two. If >>> there's no decimation, that would be still a couple of 2000-coefficient >>> FIR filters each with a half of coefficients equal to zero. >> >> I think you misunderstood somehow. Each filter computes half the >> output samples. The full stream is recovered by interleaving the >> output of both processors back to the full stream. >> >> This is easily done in a FIR by advancing the samples two stages >> through the delay line between output samples rather than one, which >> is a pretty simple thing to do in software. >> >> > >Thanks for clarification, Eric. I didn't have any idea about whether the >OP's FIRs are software or hardware.
I think usually a DSP means a DSP processor, like a 56k or a Sharc or something, which means software. Either way, it's also easy in hardware.