# Parallel implementation of Multirate filter

Started by November 26, 2012
```actually, it doesn't sound that bad, as long as I may use 7 parallel
outputs: It unrolls a complete polyphase cycle, and there is no need for
commutator switches anymore.

A quick drawing:
http://www.dsprelated.com/blogimages/MarkusNentwig/comp.dsp/121127.png

The delay lines show only, where the input samples - if I reorder them
linearly in time - end up in the parallel delay lines.
Then, calculate each output according to the equations.
If pipelining becomes necessary, it seems like a relatively straightforward
exercise, because everything happens at the same clock, and coefficients
are now constant.
```
```On Tue, 27 Nov 2012 14:59:09 -0600, mnentwig wrote:

> actually, it doesn't sound that bad, as long as I may use 7 parallel
> outputs: It unrolls a complete polyphase cycle, and there is no need for
> commutator switches anymore.

Even if you have to use 8 parallel outputs that's an independent problem
to generating the right numbers: just have an internal block that
generates the magic 7, then a totally separate 7-in, 8-out FIFO-ish thing.

> A quick drawing:
> http://www.dsprelated.com/blogimages/MarkusNentwig/comp.dsp/121127.png
>
> The delay lines show only, where the input samples - if I reorder them
> linearly in time - end up in the parallel delay lines. Then, calculate
> each output according to the equations. If pipelining becomes necessary,
> it seems like a relatively straightforward exercise, because everything
> happens at the same clock, and coefficients are now constant.

What you're showing and what I think that the OP is saying are different,
though.  If I'm not mistaken, the ADC puts out samples n, n+1, n+2, ..., n
+7 at once, then waits eight clocks, then does it again.  So your filter
would need to take those all into account (plus the prior collected
vector) to get the seven outputs ready for reordering.

--
My liberal friends think I'm a conservative kook.
My conservative friends think I'm a liberal kook.
Why am I not happy that they have found common ground?

Tim Wescott, Communications, Control, Circuits & Software
http://www.wescottdesign.com
```
```Hello,

>Even if you have to use 8 parallel outputs that's an independent problem
>to generating the right numbers: just have an internal block that
>generates the magic 7, then a totally separate 7-in, 8-out FIFO-ish
thing.

Yes, it is not exactly as required (8 output instead of 7). I'd rather do
the re-ordering separately from the filter, if I had to code it. But, both
ways should work, and be indistinguishable from the outside.

>> If I'm not mistaken, the ADC puts out samples n, n+1, n+2, ..., n+7 at
once, then waits eight clocks, then does it again.

Yes, that's what I'm assuming. The oldest sample is at the bottom (x7), the
youngest sample at the top (x0).

```
```On Wed, 28 Nov 2012 00:46:46 -0600, mnentwig wrote:

> Hello,
>
>>Even if you have to use 8 parallel outputs that's an independent problem
>>to generating the right numbers: just have an internal block that
>>generates the magic 7, then a totally separate 7-in, 8-out FIFO-ish
> thing.
>
> Yes, it is not exactly as required (8 output instead of 7). I'd rather
> do the re-ordering separately from the filter, if I had to code it. But,
> both ways should work, and be indistinguishable from the outside.
>
>>> If I'm not mistaken, the ADC puts out samples n, n+1, n+2, ..., n+7 at
> once, then waits eight clocks, then does it again.
>
> Yes, that's what I'm assuming. The oldest sample is at the bottom (x7),
> the youngest sample at the top (x0).

Hah.  I misread your graphic.  All those blocks are just for the delay,
and your output math is done in math below.  We've been describing the
same damn thing in different language.

Yes, you're doing exactly what I'd do, although I'd show it differently
in documentation.

The OP hasn't mentioned clock rates yet, but I'm still thinking that if
it's so fast that you need eight parallel lines to get it _into_ the FPGA,
then you'd need to do at least some parallel (and probably pipelined)
work to get it to execute _inside_ of the FPGA.

But that's an implementation detail that depends on actual sampling
rates, and the OP hasn't bothered to share that part of the problem.

--
My liberal friends think I'm a conservative kook.
My conservative friends think I'm a liberal kook.
Why am I not happy that they have found common ground?

Tim Wescott, Communications, Control, Circuits & Software
http://www.wescottdesign.com
```
```>On Wed, 28 Nov 2012 00:46:46 -0600, mnentwig wrote:
>
>> Hello,
>>
>>>Even if you have to use 8 parallel outputs that's an independent
problem
>>>to generating the right numbers: just have an internal block that
>>>generates the magic 7, then a totally separate 7-in, 8-out FIFO-ish
>> thing.
>>
>> Yes, it is not exactly as required (8 output instead of 7). I'd rather
>> do the re-ordering separately from the filter, if I had to code it.
But,
>> both ways should work, and be indistinguishable from the outside.
>>
>>>> If I'm not mistaken, the ADC puts out samples n, n+1, n+2, ..., n+7
at
>> once, then waits eight clocks, then does it again.
>>
>> Yes, that's what I'm assuming. The oldest sample is at the bottom (x7),
>> the youngest sample at the top (x0).
>
>Hah.  I misread your graphic.  All those blocks are just for the delay,
>and your output math is done in math below.  We've been describing the
>same damn thing in different language.
>
>Yes, you're doing exactly what I'd do, although I'd show it differently
>in documentation.
>
>The OP hasn't mentioned clock rates yet, but I'm still thinking that if
>it's so fast that you need eight parallel lines to get it _into_ the FPGA,

>then you'd need to do at least some parallel (and probably pipelined)
>work to get it to execute _inside_ of the FPGA.
>
>But that's an implementation detail that depends on actual sampling
>rates, and the OP hasn't bothered to share that part of the problem.
>
>--
>My liberal friends think I'm a conservative kook.
>My conservative friends think I'm a liberal kook.
>Why am I not happy that they have found common ground?
>
>Tim Wescott, Communications, Control, Circuits & Software
>http://www.wescottdesign.com
>

Hello All,

This is really a nice discussion, and amazingly high quality :-)
I was trying to follow and decode all these ideas the last days.

So if I got it right, I should include a dual port RAM block after the
multirate filter with Fs1 as an input clock, where 7 samples are written in
parallel at each clock cycle. While on the output port I have Fs2 which
reads 8 samples at each clock cycle. This will insure that the RAM is not
full or empty at any moment in time.

By this way I should get 8 parallel lines on the output with Fs2 as a
sampling rate! Nice... :D

Thank you a lot amigos for this rich explanation.

Hassans

```