Reply by Hassans November 29, 20122012-11-29
>On Wed, 28 Nov 2012 00:46:46 -0600, mnentwig wrote: > >> Hello, >> >>>Even if you have to use 8 parallel outputs that's an independent
problem
>>>to generating the right numbers: just have an internal block that >>>generates the magic 7, then a totally separate 7-in, 8-out FIFO-ish >> thing. >> >> Yes, it is not exactly as required (8 output instead of 7). I'd rather >> do the re-ordering separately from the filter, if I had to code it.
But,
>> both ways should work, and be indistinguishable from the outside. >> >>>> If I'm not mistaken, the ADC puts out samples n, n+1, n+2, ..., n+7
at
>> once, then waits eight clocks, then does it again. >> >> Yes, that's what I'm assuming. The oldest sample is at the bottom (x7), >> the youngest sample at the top (x0). > >Hah. I misread your graphic. All those blocks are just for the delay, >and your output math is done in math below. We've been describing the >same damn thing in different language. > >Yes, you're doing exactly what I'd do, although I'd show it differently >in documentation. > >The OP hasn't mentioned clock rates yet, but I'm still thinking that if >it's so fast that you need eight parallel lines to get it _into_ the FPGA,
>then you'd need to do at least some parallel (and probably pipelined) >work to get it to execute _inside_ of the FPGA. > >But that's an implementation detail that depends on actual sampling >rates, and the OP hasn't bothered to share that part of the problem. > >-- >My liberal friends think I'm a conservative kook. >My conservative friends think I'm a liberal kook. >Why am I not happy that they have found common ground? > >Tim Wescott, Communications, Control, Circuits & Software >http://www.wescottdesign.com >
Hello All, This is really a nice discussion, and amazingly high quality :-) I was trying to follow and decode all these ideas the last days. So if I got it right, I should include a dual port RAM block after the multirate filter with Fs1 as an input clock, where 7 samples are written in parallel at each clock cycle. While on the output port I have Fs2 which reads 8 samples at each clock cycle. This will insure that the RAM is not full or empty at any moment in time. By this way I should get 8 parallel lines on the output with Fs2 as a sampling rate! Nice... :D Thank you a lot amigos for this rich explanation. Hassans
Reply by Tim Wescott November 28, 20122012-11-28
On Wed, 28 Nov 2012 00:46:46 -0600, mnentwig wrote:

> Hello, > >>Even if you have to use 8 parallel outputs that's an independent problem >>to generating the right numbers: just have an internal block that >>generates the magic 7, then a totally separate 7-in, 8-out FIFO-ish > thing. > > Yes, it is not exactly as required (8 output instead of 7). I'd rather > do the re-ordering separately from the filter, if I had to code it. But, > both ways should work, and be indistinguishable from the outside. > >>> If I'm not mistaken, the ADC puts out samples n, n+1, n+2, ..., n+7 at > once, then waits eight clocks, then does it again. > > Yes, that's what I'm assuming. The oldest sample is at the bottom (x7), > the youngest sample at the top (x0).
Hah. I misread your graphic. All those blocks are just for the delay, and your output math is done in math below. We've been describing the same damn thing in different language. Yes, you're doing exactly what I'd do, although I'd show it differently in documentation. The OP hasn't mentioned clock rates yet, but I'm still thinking that if it's so fast that you need eight parallel lines to get it _into_ the FPGA, then you'd need to do at least some parallel (and probably pipelined) work to get it to execute _inside_ of the FPGA. But that's an implementation detail that depends on actual sampling rates, and the OP hasn't bothered to share that part of the problem. -- My liberal friends think I'm a conservative kook. My conservative friends think I'm a liberal kook. Why am I not happy that they have found common ground? Tim Wescott, Communications, Control, Circuits & Software http://www.wescottdesign.com
Reply by mnentwig November 28, 20122012-11-28
Hello,

>Even if you have to use 8 parallel outputs that's an independent problem >to generating the right numbers: just have an internal block that >generates the magic 7, then a totally separate 7-in, 8-out FIFO-ish
thing. Yes, it is not exactly as required (8 output instead of 7). I'd rather do the re-ordering separately from the filter, if I had to code it. But, both ways should work, and be indistinguishable from the outside.
>> If I'm not mistaken, the ADC puts out samples n, n+1, n+2, ..., n+7 at
once, then waits eight clocks, then does it again. Yes, that's what I'm assuming. The oldest sample is at the bottom (x7), the youngest sample at the top (x0).
Reply by Tim Wescott November 27, 20122012-11-27
On Tue, 27 Nov 2012 14:59:09 -0600, mnentwig wrote:

> actually, it doesn't sound that bad, as long as I may use 7 parallel > outputs: It unrolls a complete polyphase cycle, and there is no need for > commutator switches anymore.
Even if you have to use 8 parallel outputs that's an independent problem to generating the right numbers: just have an internal block that generates the magic 7, then a totally separate 7-in, 8-out FIFO-ish thing.
> A quick drawing: > note, I changed indices to start with 0. > http://www.dsprelated.com/blogimages/MarkusNentwig/comp.dsp/121127.png > > The delay lines show only, where the input samples - if I reorder them > linearly in time - end up in the parallel delay lines. Then, calculate > each output according to the equations. If pipelining becomes necessary, > it seems like a relatively straightforward exercise, because everything > happens at the same clock, and coefficients are now constant.
What you're showing and what I think that the OP is saying are different, though. If I'm not mistaken, the ADC puts out samples n, n+1, n+2, ..., n +7 at once, then waits eight clocks, then does it again. So your filter would need to take those all into account (plus the prior collected vector) to get the seven outputs ready for reordering. -- My liberal friends think I'm a conservative kook. My conservative friends think I'm a liberal kook. Why am I not happy that they have found common ground? Tim Wescott, Communications, Control, Circuits & Software http://www.wescottdesign.com
Reply by mnentwig November 27, 20122012-11-27
actually, it doesn't sound that bad, as long as I may use 7 parallel
outputs: It unrolls a complete polyphase cycle, and there is no need for
commutator switches anymore.

A quick drawing: 
note, I changed indices to start with 0.
http://www.dsprelated.com/blogimages/MarkusNentwig/comp.dsp/121127.png

The delay lines show only, where the input samples - if I reorder them
linearly in time - end up in the parallel delay lines.
Then, calculate each output according to the equations. 
If pipelining becomes necessary, it seems like a relatively straightforward
exercise, because everything happens at the same clock, and coefficients
are now constant.
Reply by Tim Wescott November 27, 20122012-11-27
On Tue, 27 Nov 2012 04:37:27 -0600, mnentwig wrote:

> what I'd do is to start with a conventional polyphase interpolate-by-7 > FIR filter: > > y1 = x1 c1 + x2 c8 + x3 c15 + x4 c22 + x5 c29 + ... y2 = x1 c2 + x2 c9 > + x3 c16 + x4 c23 + x5 c30 + ... y3 = x1 c3 + x2 c10 + x3 c17 + x4 c24 + > x5 c33 + ... > > then decimate by 8. That is, calculate only y1, y9, y17, y25 etc, and > discard the equations in-between. > > At this point I've got a single-input single-output 7 up 8 down > polyphase resampler. > Next, take the remaining equations for eight consecutive (decimated) y > samples, and implement them in parallel. What remains is to distribute > the parallel inputs, a mere formality. > > This is just a quick "lunch break study", maybe someone else comes up > with a better solution.
That looks right to me, once I got my morning-impaired brain wrapped around it. Hassans: Don't be surprised that you'll need to save the previous eight samples or perhaps more: that's kind of a requirement for any filtering. Internally you'll probably want a step in your pipeline that has a vector of the current N * 8 samples all lined up and ready to go into whatever the next step in the pipeline is. If the data is going by too fast to be put into a serial stream then even after you get the algorithm ironed out it's still going to be challenging to get the filter working. I foresee a lot of pipelining, a big FPGA, and a lot of picky book-keeping to get the filter to execute fast enough. I strongly suggest that you make sure that you have a very firm grasp of the algorithm you're trying to implement before you start trying to make it work at speed and with the 7:8 decimation in your output clock. I'd probably want to make sure that I had an accurate behavioral representation simulated in the HDL of my choice that was rock-solid and bone-headed (i.e., optimize the code for transparency and readability, with no attempt to make it synthesizable). Then when I'd made the "real" filter I'd make sure to test against the "bone-head" implementation in simulation. Failing an HDL test article, I'd make sure to simulate the filter action in Matlab or Scilab (or Excell or whatever) and test the input/output behavior of the synthesizable model against that. -- My liberal friends think I'm a conservative kook. My conservative friends think I'm a liberal kook. Why am I not happy that they have found common ground? Tim Wescott, Communications, Control, Circuits & Software http://www.wescottdesign.com
Reply by mnentwig November 27, 20122012-11-27
Still, for a structure that generates 7 output samples by consuming 8 input
samples, it seems more straightforward to design for 7 parallel outputs.

You'd get 7 simple filters with fixed coefficients, and have a common clock
rate at input and output. 
What remains to be done for 8 outputs is to shuffle whole samples from 7 to
8 parallel registers at a slightly lower rate.

Both approaches will do the same, but I think I'd prefer the second one for
implementation. 
Apparently, this is the approach taken in the paper (haven't read it yet -
has to wait until dinner :-)

Reply by DougB November 27, 20122012-11-27
>what I'd do is to start with a conventional polyphase interpolate-by-7
FIR
>filter: > >y1 = x1 c1 + x2 c8 + x3 c15 + x4 c22 + x5 c29 + ... >y2 = x1 c2 + x2 c9 + x3 c16 + x4 c23 + x5 c30 + ... >y3 = x1 c3 + x2 c10 + x3 c17 + x4 c24 + x5 c33 + ... > >then decimate by 8. That is, calculate only y1, y9, y17, y25 etc, and >discard the equations in-between. > >At this point I've got a single-input single-output 7 up 8 down polyphase >resampler. >Next, take the remaining equations for eight consecutive (decimated) y >samples, and implement them in parallel. What remains is to distribute
the
>parallel inputs, a mere formality. > >This is just a quick "lunch break study", maybe someone else comes up
with
>a better solution. >
Good lunch break study - this is precisely how it should be done. Make sure to design your filter with a convenient number of coefficients. -Doug
Reply by mnentwig November 27, 20122012-11-27
>> c33
c31...
Reply by mnentwig November 27, 20122012-11-27
what I'd do is to start with a conventional polyphase interpolate-by-7 FIR
filter:

y1 = x1 c1 + x2  c8 + x3 c15 + x4 c22 + x5 c29 + ...
y2 = x1 c2 + x2  c9 + x3 c16 + x4 c23 + x5 c30 + ...
y3 = x1 c3 + x2 c10 + x3 c17 + x4 c24 + x5 c33 + ...

then decimate by 8. That is, calculate only y1, y9, y17, y25 etc, and
discard the equations in-between.

At this point I've got a single-input single-output 7 up 8 down polyphase
resampler.
Next, take the remaining equations for eight consecutive (decimated) y
samples, and implement them in parallel. What remains is to distribute the
parallel inputs, a mere formality.

This is just a quick "lunch break study", maybe someone else comes up with
a better solution.