Hi People, I'm trying to think of an efficient way to implement a Shaped OQPSK modulator in hardware (FPGA or CPLD). My first thought was to use an FIR filter. However, since there are only two input bits per symbol, there are only a relatively small number of output states and (it seems) a LUT could be much more efficient. Has anyone thought through this a little better and have any pointers or suggestions? -- Randy Yates % "And all you had to say Digital Signal Labs % was that you were mailto://yates@ieee.org % gonna stay." http://www.digitalsignallabs.com % Getting To The Point', *Balance of Power*, ELO

# Lookup Table Vs. FIR For Modulator Pulse Shaping

Started by ●March 27, 2010

Posted by ●March 27, 2010

On 3/27/2010 6:36 PM, Randy Yates wrote:> Hi People, > > I'm trying to think of an efficient way to implement a Shaped OQPSK > modulator in hardware (FPGA or CPLD). My first thought was to use an FIR > filter. However, since there are only two input bits per symbol, there > are only a relatively small number of output states and (it seems) a LUT > could be much more efficient. > > Has anyone thought through this a little better and have any pointers > or suggestions?A lut is a very common way to implement a modulator. You only need enough address bits to cover N symbols plus however many phases you want per symbol (four is often enough). You can even add address bits for different pulse shapes or modulation orders. Even in an FPGA, if the symbol rate is fixed, this is often the most efficient way to do it. -- Eric Jacobsen Minister of Algorithms Abineau Communications http://www.abineau.com

Posted by ●March 28, 2010

Randy Yates <yates@ieee.org> wrote:> I'm trying to think of an efficient way to implement a Shaped OQPSK > modulator in hardware (FPGA or CPLD). My first thought was to use an FIR > filter. However, since there are only two input bits per symbol, there > are only a relatively small number of output states and (it seems) a LUT > could be much more efficient.Well, the primary logic unit of an FPGA is look-up table. If you are multiplying by constants, especially with a two bit input, yes, look-up tables are likely the best way. It is usual for FPGAs to have flip-flops at the output of each LUT, which makes systolic array pipelines real easy to build. If you latch at each logic level, they are really fast, too! Not knowing OQPSK, can you describe a little what expression you need evaluated? -- glen

Posted by ●March 28, 2010

Randy Yates wrote:> Hi People, > > I'm trying to think of an efficient way to implement a Shaped OQPSK > modulator in hardware (FPGA or CPLD). My first thought was to use an FIR > filter. However, since there are only two input bits per symbol, there > are only a relatively small number of output states and (it seems) a LUT > could be much more efficient. > > Has anyone thought through this a little better and have any pointers > or suggestions?Using LUT for modulators is pretty common; see G3RUH GMSK modem for example. However, FPGA utilization is going to be rather inefficient if you need LUT for more then, say, 256 entries. I cheer those who work on weekends. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com

Posted by ●March 28, 2010

On Sun, 28 Mar 2010 06:59:26 -0500, Vladimir Vassilevsky <nospam@nowhere.com> wrote:>I cheer those who work on weekends.What's a weekend? Greg

Posted by ●March 28, 2010

>Randy Yates <yates@ieee.org> wrote:> >> I'm trying to think of an efficient way to implement a Shaped OQPSK >> modulator in hardware (FPGA or CPLD). My first thought was to use anFIR>> filter. However, since there are only two input bits per symbol, there >> are only a relatively small number of output states and (it seems) aLUT>> could be much more efficient. > >Well, the primary logic unit of an FPGA is look-up table. > >If you are multiplying by constants, especially with a two bit >input, yes, look-up tables are likely the best way. > >It is usual for FPGAs to have flip-flops at the output of each >LUT, which makes systolic array pipelines real easy to build. >If you latch at each logic level, they are really fast, too! > >Not knowing OQPSK, can you describe a little what expression >you need evaluated? > >-- glenQAM shaping by root raised cosine is commonly done in a multiplierless transposed FIR structure using precomputed products(LUTs). If you are also upsampling by 2(least practical requirement)then you can split up the structure to two polyphases. The LUTs need to be as many as the number of longest polyphase taps with each LUT of a small size suitable for your levels.The symbol bit pattern can then be used to address all LUTs and get the products in the serial chain of adders. By far this is the most efficient structure than using multipliers. It also suits several QAM modes and you change the LUTs readily for rolloff or gain control...etc. kaz>

Posted by ●March 28, 2010

Greg Berchin wrote:> On Sun, 28 Mar 2010 06:59:26 -0500, Vladimir Vassilevsky <nospam@nowhere.com> > wrote: > > >>I cheer those who work on weekends. > > > What's a weekend?A periodical thing when the other folks are trying to distract you from work. VLV

Posted by ●March 29, 2010

Eric Jacobsen <eric.jacobsen@ieee.org> writes:> On 3/27/2010 6:36 PM, Randy Yates wrote: >> Hi People, >> >> I'm trying to think of an efficient way to implement a Shaped OQPSK >> modulator in hardware (FPGA or CPLD). My first thought was to use an FIR >> filter. However, since there are only two input bits per symbol, there >> are only a relatively small number of output states and (it seems) a LUT >> could be much more efficient. >> >> Has anyone thought through this a little better and have any pointers >> or suggestions? > > A lut is a very common way to implement a modulator. You only need > enough address bits to cover N symbols plus however many phases you > want per symbol (four is often enough). You can even add address bits > for different pulse shapes or modulation orders. > > Even in an FPGA, if the symbol rate is fixed, this is often the most > efficient way to do it.Right. The basic thing I was wrapping my brain around was the completely different viewpoint of an FIR as a finite state machine (a Mealy FSM, to be precise). I'd never thought of it that way. For a K-bit input, there are 2^(K*N) states where N is the FIR length. Therefore a LUT with K*N address inputs and L output bits suffices to describe the filter. When K and N are big (say, K = 16 and N = 32), it's impractical to use a LUT. But for small values it becomes feasible; maybe even preferable... OK, I think I'm now sufficiently out of the rut of "everything is a MAC"... -- Randy Yates % "Bird, on the wing, Digital Signal Labs % goes floating by mailto://yates@ieee.org % but there's a teardrop in his eye..." http://www.digitalsignallabs.com % 'One Summer Dream', *Face The Music*, ELO

Posted by ●March 29, 2010

e precise). I'd never thought of it that way.> >For a K-bit input, there are 2^(K*N) states where N is the FIR length. >Therefore a LUT with K*N address inputs and L output bits suffices to >describe the filter. > >When K and N are big (say, K = 16 and N = 32), it's impractical to use a >LUT. But for small values it becomes feasible; maybe even preferable... > >OK, I think I'm now sufficiently out of the rut of "everything is a >MAC"... >--Is there any math / theorems that describe the behavior of a FIR filter as a state-machine (trellis?). Could there exist a direct relation between, say the state variable (2**(K*N)), and the output? If there was a direct relationship between the state-variable and the output a large state-variable and corresponding state-machine might be reasonable resource usage. I think from this perspective a traditional FIR maintains multiple state information and computes the output? chris

Posted by ●April 1, 2010

glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:>It is usual for FPGAs to have flip-flops at the output of each >LUT, which makes systolic array pipelines real easy to build. >If you latch at each logic level, they are really fast, too!Doesn't the usual definition of "systolic array" require that there is no globally synchrnous clock, that instead the clock follows the data around? And if so, would one really design with an FPGA in such a fashion? Steve

Posted by ●April 4, 2010

Jerry Avins <jya@ieee.org> wrote:>Asynchronous operation is commom in hardware FIFOs. I may have built the >first with discrete IC logic, and I recall a few LSI designs. Chuck >Moore of Forth fame builds CPUs that way.Async operation is also common in all modern DRAM's. In fact I would say that the current prevalent ASIC timing model, which is timing closure, is an asynchronous approach, when compared with earlier disciplines such as IBM's LSSD's or Mead-Conway non-overlapping clocks. These earlier approaches tried to make an entire chip, or at least more of a chip, synchronous. This is now no longer done. So in a sense, the systolic asynchronous array approach won the architecture battle. It's just not done explicitly at an architecture level; it's implemented in the backend. Note to all: I tried to wedge things such that I could go to Kansas City, but it failed, so I won't be there. I wish I could, and I thank all the organizers and contributors. Have a grand time. S.

Posted by ●April 1, 2010

On 4/1/2010 8:17 PM, glen herrmannsfeldt wrote:> Steve Pope<spope33@speedymail.org> wrote: > (snip regarding systolic arrays and global clocks) > >> We did a Reed-Solomon decoder with such a clocking approach once. >> It is written up in the chapter by Berlekamp et. al. in Stephen Wicker's >> book _Reed Solomon Codes_ . > >> It was necessary because we used ECL logic and it was attractive to >> not have to run an entire large board synchrnously. However, each >> FPGA was itself synchronous. > > The ones I know have data going only one direction, which allows > for various different clock methods. Among others, you can latch > all the signals at any point and delay them. There are systolic > arrays with data going both directions, though. Also 2D arrays, > with data flow in orthogonal directions. > > For systolic array search processors the array length determines > the allowed query length. I have known virtual arrays that run > the data through and store all the array output, then load the > next part of the query and run the data through that, again storing > the array output.Asynchronous operation is commom in hardware FIFOs. I may have built the first with discrete IC logic, and I recall a few LSI designs. Chuck Moore of Forth fame builds CPUs that way. Jerry -- "It does me no injury for my neighbor to say there are 20 gods, or no God. It neither picks my pocket nor breaks my leg." Thomas Jefferson to the Virginia House of Delegates in 1776. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

Posted by ●April 1, 2010

Steve Pope <spope33@speedymail.org> wrote: (snip regarding systolic arrays and global clocks)> We did a Reed-Solomon decoder with such a clocking approach once. > It is written up in the chapter by Berlekamp et. al. in Stephen Wicker's > book _Reed Solomon Codes_ .> It was necessary because we used ECL logic and it was attractive to > not have to run an entire large board synchrnously. However, each > FPGA was itself synchronous.The ones I know have data going only one direction, which allows for various different clock methods. Among others, you can latch all the signals at any point and delay them. There are systolic arrays with data going both directions, though. Also 2D arrays, with data flow in orthogonal directions. For systolic array search processors the array length determines the allowed query length. I have known virtual arrays that run the data through and store all the array output, then load the next part of the query and run the data through that, again storing the array output. -- glen

Posted by ●April 1, 2010

glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:>Steve Pope <spope33@speedymail.org> wrote:>> Doesn't the usual definition of "systolic array" require that >> there is no globally synchrnous clock, that instead the clock >> follows the data around?>All the ones I know have a global clock.>> And if so, would one really design with an FPGA in such a fashion?>Systolic array is more an architecture than an implementation. >One could clock them differently, and in some cases I think it >is useful. For example, in a multi-board array having a separate >clock for each board, syncronized with the data, seems like it >might work better than a global (multi-board) clock.We did a Reed-Solomon decoder with such a clocking approach once. It is written up in the chapter by Berlekamp et. al. in Stephen Wicker's book _Reed Solomon Codes_ . It was necessary because we used ECL logic and it was attractive to not have to run an entire large board synchrnously. However, each FPGA was itself synchronous. Steve

Posted by ●April 1, 2010

Steve Pope <spope33@speedymail.org> wrote:> glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:>>It is usual for FPGAs to have flip-flops at the output of each >>LUT, which makes systolic array pipelines real easy to build. >>If you latch at each logic level, they are really fast, too!> Doesn't the usual definition of "systolic array" require that > there is no globally synchrnous clock, that instead the clock > follows the data around?All the ones I know have a global clock.> And if so, would one really design with an FPGA in such a fashion?Systolic array is more an architecture than an implementation. One could clock them differently, and in some cases I think it is useful. For example, in a multi-board array having a separate clock for each board, syncronized with the data, seems like it might work better than a global (multi-board) clock. -- glen

Posted by ●April 1, 2010

cfelton <cfelton@n_o_s_p_a_m.ieee.org> wrote:>Is there any math / theorems that describe the behavior of a FIR filter as >a state-machine (trellis?). Could there exist a direct relation between, >say the state variable (2**(K*N)), and the output?I'm not sure if there are any theorems surrounding it, but what you are describing sounds like the concept behind a Forney equalizer. Steve

Posted by ●April 1, 2010

glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:>It is usual for FPGAs to have flip-flops at the output of each >LUT, which makes systolic array pipelines real easy to build. >If you latch at each logic level, they are really fast, too!Doesn't the usual definition of "systolic array" require that there is no globally synchrnous clock, that instead the clock follows the data around? And if so, would one really design with an FPGA in such a fashion? Steve

Posted by ●March 29, 2010

e precise). I'd never thought of it that way.> >For a K-bit input, there are 2^(K*N) states where N is the FIR length. >Therefore a LUT with K*N address inputs and L output bits suffices to >describe the filter. > >When K and N are big (say, K = 16 and N = 32), it's impractical to use a >LUT. But for small values it becomes feasible; maybe even preferable... > >OK, I think I'm now sufficiently out of the rut of "everything is a >MAC"... >--Is there any math / theorems that describe the behavior of a FIR filter as a state-machine (trellis?). Could there exist a direct relation between, say the state variable (2**(K*N)), and the output? If there was a direct relationship between the state-variable and the output a large state-variable and corresponding state-machine might be reasonable resource usage. I think from this perspective a traditional FIR maintains multiple state information and computes the output? chris

Posted by ●March 29, 2010

Eric Jacobsen <eric.jacobsen@ieee.org> writes:> On 3/27/2010 6:36 PM, Randy Yates wrote: >> Hi People, >> >> I'm trying to think of an efficient way to implement a Shaped OQPSK >> modulator in hardware (FPGA or CPLD). My first thought was to use an FIR >> filter. However, since there are only two input bits per symbol, there >> are only a relatively small number of output states and (it seems) a LUT >> could be much more efficient. >> >> Has anyone thought through this a little better and have any pointers >> or suggestions? > > A lut is a very common way to implement a modulator. You only need > enough address bits to cover N symbols plus however many phases you > want per symbol (four is often enough). You can even add address bits > for different pulse shapes or modulation orders. > > Even in an FPGA, if the symbol rate is fixed, this is often the most > efficient way to do it.Right. The basic thing I was wrapping my brain around was the completely different viewpoint of an FIR as a finite state machine (a Mealy FSM, to be precise). I'd never thought of it that way. For a K-bit input, there are 2^(K*N) states where N is the FIR length. Therefore a LUT with K*N address inputs and L output bits suffices to describe the filter. When K and N are big (say, K = 16 and N = 32), it's impractical to use a LUT. But for small values it becomes feasible; maybe even preferable... OK, I think I'm now sufficiently out of the rut of "everything is a MAC"... -- Randy Yates % "Bird, on the wing, Digital Signal Labs % goes floating by mailto://yates@ieee.org % but there's a teardrop in his eye..." http://www.digitalsignallabs.com % 'One Summer Dream', *Face The Music*, ELO

Posted by ●March 28, 2010