Lookup Table Vs. FIR For Modulator Pulse Shaping

Started by Randy Yates March 27, 2010
Hi People,

I'm trying to think of an efficient way to implement a Shaped OQPSK
modulator in hardware (FPGA or CPLD). My first thought was to use an FIR
filter. However, since there are only two input bits per symbol, there
are only a relatively small number of output states and (it seems) a LUT
could be much more efficient.

Has anyone thought through this a little better and have any pointers
or suggestions?
-- 
Randy Yates                      % "And all you had to say
Digital Signal Labs              %  was that you were 
mailto://yates@ieee.org          %  gonna stay."
http://www.digitalsignallabs.com % Getting To The Point', *Balance of Power*, ELO
On 3/27/2010 6:36 PM, Randy Yates wrote:
> Hi People, > > I'm trying to think of an efficient way to implement a Shaped OQPSK > modulator in hardware (FPGA or CPLD). My first thought was to use an FIR > filter. However, since there are only two input bits per symbol, there > are only a relatively small number of output states and (it seems) a LUT > could be much more efficient. > > Has anyone thought through this a little better and have any pointers > or suggestions?
A lut is a very common way to implement a modulator. You only need enough address bits to cover N symbols plus however many phases you want per symbol (four is often enough). You can even add address bits for different pulse shapes or modulation orders. Even in an FPGA, if the symbol rate is fixed, this is often the most efficient way to do it. -- Eric Jacobsen Minister of Algorithms Abineau Communications http://www.abineau.com
Randy Yates  wrote:
 
> I'm trying to think of an efficient way to implement a Shaped OQPSK > modulator in hardware (FPGA or CPLD). My first thought was to use an FIR > filter. However, since there are only two input bits per symbol, there > are only a relatively small number of output states and (it seems) a LUT > could be much more efficient.
Well, the primary logic unit of an FPGA is look-up table. If you are multiplying by constants, especially with a two bit input, yes, look-up tables are likely the best way. It is usual for FPGAs to have flip-flops at the output of each LUT, which makes systolic array pipelines real easy to build. If you latch at each logic level, they are really fast, too! Not knowing OQPSK, can you describe a little what expression you need evaluated? -- glen

Randy Yates wrote:

> Hi People, > > I'm trying to think of an efficient way to implement a Shaped OQPSK > modulator in hardware (FPGA or CPLD). My first thought was to use an FIR > filter. However, since there are only two input bits per symbol, there > are only a relatively small number of output states and (it seems) a LUT > could be much more efficient. > > Has anyone thought through this a little better and have any pointers > or suggestions?
Using LUT for modulators is pretty common; see G3RUH GMSK modem for example. However, FPGA utilization is going to be rather inefficient if you need LUT for more then, say, 256 entries. I cheer those who work on weekends. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
On Sun, 28 Mar 2010 06:59:26 -0500, Vladimir Vassilevsky 
wrote:

>I cheer those who work on weekends.
What's a weekend? Greg
>Randy Yates wrote: > >> I'm trying to think of an efficient way to implement a Shaped OQPSK >> modulator in hardware (FPGA or CPLD). My first thought was to use an
FIR
>> filter. However, since there are only two input bits per symbol, there >> are only a relatively small number of output states and (it seems) a
LUT
>> could be much more efficient. > >Well, the primary logic unit of an FPGA is look-up table. > >If you are multiplying by constants, especially with a two bit >input, yes, look-up tables are likely the best way. > >It is usual for FPGAs to have flip-flops at the output of each >LUT, which makes systolic array pipelines real easy to build. >If you latch at each logic level, they are really fast, too! > >Not knowing OQPSK, can you describe a little what expression >you need evaluated? > >-- glen
QAM shaping by root raised cosine is commonly done in a multiplierless transposed FIR structure using precomputed products(LUTs). If you are also upsampling by 2(least practical requirement)then you can split up the structure to two polyphases. The LUTs need to be as many as the number of longest polyphase taps with each LUT of a small size suitable for your levels.The symbol bit pattern can then be used to address all LUTs and get the products in the serial chain of adders. By far this is the most efficient structure than using multipliers. It also suits several QAM modes and you change the LUTs readily for rolloff or gain control...etc. kaz
>

Greg Berchin wrote:

> On Sun, 28 Mar 2010 06:59:26 -0500, Vladimir Vassilevsky > wrote: > > >>I cheer those who work on weekends. > > > What's a weekend?
A periodical thing when the other folks are trying to distract you from work. VLV
Eric Jacobsen  writes:

> On 3/27/2010 6:36 PM, Randy Yates wrote: >> Hi People, >> >> I'm trying to think of an efficient way to implement a Shaped OQPSK >> modulator in hardware (FPGA or CPLD). My first thought was to use an FIR >> filter. However, since there are only two input bits per symbol, there >> are only a relatively small number of output states and (it seems) a LUT >> could be much more efficient. >> >> Has anyone thought through this a little better and have any pointers >> or suggestions? > > A lut is a very common way to implement a modulator. You only need > enough address bits to cover N symbols plus however many phases you > want per symbol (four is often enough). You can even add address bits > for different pulse shapes or modulation orders. > > Even in an FPGA, if the symbol rate is fixed, this is often the most > efficient way to do it.
Right. The basic thing I was wrapping my brain around was the completely different viewpoint of an FIR as a finite state machine (a Mealy FSM, to be precise). I'd never thought of it that way. For a K-bit input, there are 2^(K*N) states where N is the FIR length. Therefore a LUT with K*N address inputs and L output bits suffices to describe the filter. When K and N are big (say, K = 16 and N = 32), it's impractical to use a LUT. But for small values it becomes feasible; maybe even preferable... OK, I think I'm now sufficiently out of the rut of "everything is a MAC"... -- Randy Yates % "Bird, on the wing, Digital Signal Labs % goes floating by mailto://yates@ieee.org % but there's a teardrop in his eye..." http://www.digitalsignallabs.com % 'One Summer Dream', *Face The Music*, ELO
e precise). I'd never thought of it that way.
> >For a K-bit input, there are 2^(K*N) states where N is the FIR length. >Therefore a LUT with K*N address inputs and L output bits suffices to >describe the filter. > >When K and N are big (say, K = 16 and N = 32), it's impractical to use a >LUT. But for small values it becomes feasible; maybe even preferable... > >OK, I think I'm now sufficiently out of the rut of "everything is a >MAC"... >--
Is there any math / theorems that describe the behavior of a FIR filter as a state-machine (trellis?). Could there exist a direct relation between, say the state variable (2**(K*N)), and the output? If there was a direct relationship between the state-variable and the output a large state-variable and corresponding state-machine might be reasonable resource usage. I think from this perspective a traditional FIR maintains multiple state information and computes the output? chris
glen herrmannsfeldt   wrote:

>It is usual for FPGAs to have flip-flops at the output of each >LUT, which makes systolic array pipelines real easy to build. >If you latch at each logic level, they are really fast, too!
Doesn't the usual definition of "systolic array" require that there is no globally synchrnous clock, that instead the clock follows the data around? And if so, would one really design with an FPGA in such a fashion? Steve
Jerry Avins   wrote:

>Asynchronous operation is commom in hardware FIFOs. I may have built the >first with discrete IC logic, and I recall a few LSI designs. Chuck >Moore of Forth fame builds CPUs that way.
Async operation is also common in all modern DRAM's. In fact I would say that the current prevalent ASIC timing model, which is timing closure, is an asynchronous approach, when compared with earlier disciplines such as IBM's LSSD's or Mead-Conway non-overlapping clocks. These earlier approaches tried to make an entire chip, or at least more of a chip, synchronous. This is now no longer done. So in a sense, the systolic asynchronous array approach won the architecture battle. It's just not done explicitly at an architecture level; it's implemented in the backend. Note to all: I tried to wedge things such that I could go to Kansas City, but it failed, so I won't be there. I wish I could, and I thank all the organizers and contributors. Have a grand time. S.
On 4/1/2010 8:17 PM, glen herrmannsfeldt wrote:
> Steve Pope wrote: > (snip regarding systolic arrays and global clocks) > >> We did a Reed-Solomon decoder with such a clocking approach once. >> It is written up in the chapter by Berlekamp et. al. in Stephen Wicker's >> book _Reed Solomon Codes_ . > >> It was necessary because we used ECL logic and it was attractive to >> not have to run an entire large board synchrnously. However, each >> FPGA was itself synchronous. > > The ones I know have data going only one direction, which allows > for various different clock methods. Among others, you can latch > all the signals at any point and delay them. There are systolic > arrays with data going both directions, though. Also 2D arrays, > with data flow in orthogonal directions. > > For systolic array search processors the array length determines > the allowed query length. I have known virtual arrays that run > the data through and store all the array output, then load the > next part of the query and run the data through that, again storing > the array output.
Asynchronous operation is commom in hardware FIFOs. I may have built the first with discrete IC logic, and I recall a few LSI designs. Chuck Moore of Forth fame builds CPUs that way. Jerry -- "It does me no injury for my neighbor to say there are 20 gods, or no God. It neither picks my pocket nor breaks my leg." Thomas Jefferson to the Virginia House of Delegates in 1776. ���������������������������������������������������������������������
Steve Pope  wrote:
(snip regarding systolic arrays and global clocks)
 
> We did a Reed-Solomon decoder with such a clocking approach once. > It is written up in the chapter by Berlekamp et. al. in Stephen Wicker's > book _Reed Solomon Codes_ .
> It was necessary because we used ECL logic and it was attractive to > not have to run an entire large board synchrnously. However, each > FPGA was itself synchronous.
The ones I know have data going only one direction, which allows for various different clock methods. Among others, you can latch all the signals at any point and delay them. There are systolic arrays with data going both directions, though. Also 2D arrays, with data flow in orthogonal directions. For systolic array search processors the array length determines the allowed query length. I have known virtual arrays that run the data through and store all the array output, then load the next part of the query and run the data through that, again storing the array output. -- glen
glen herrmannsfeldt   wrote:

>Steve Pope wrote:
>> Doesn't the usual definition of "systolic array" require that >> there is no globally synchrnous clock, that instead the clock >> follows the data around?
>All the ones I know have a global clock.
>> And if so, would one really design with an FPGA in such a fashion?
>Systolic array is more an architecture than an implementation. >One could clock them differently, and in some cases I think it >is useful. For example, in a multi-board array having a separate >clock for each board, syncronized with the data, seems like it >might work better than a global (multi-board) clock.
We did a Reed-Solomon decoder with such a clocking approach once. It is written up in the chapter by Berlekamp et. al. in Stephen Wicker's book _Reed Solomon Codes_ . It was necessary because we used ECL logic and it was attractive to not have to run an entire large board synchrnously. However, each FPGA was itself synchronous. Steve
Steve Pope  wrote:
> glen herrmannsfeldt wrote:
>>It is usual for FPGAs to have flip-flops at the output of each >>LUT, which makes systolic array pipelines real easy to build. >>If you latch at each logic level, they are really fast, too!
> Doesn't the usual definition of "systolic array" require that > there is no globally synchrnous clock, that instead the clock > follows the data around?
All the ones I know have a global clock.
> And if so, would one really design with an FPGA in such a fashion?
Systolic array is more an architecture than an implementation. One could clock them differently, and in some cases I think it is useful. For example, in a multi-board array having a separate clock for each board, syncronized with the data, seems like it might work better than a global (multi-board) clock. -- glen
cfelton  wrote:

>Is there any math / theorems that describe the behavior of a FIR filter as >a state-machine (trellis?). Could there exist a direct relation between, >say the state variable (2**(K*N)), and the output?
I'm not sure if there are any theorems surrounding it, but what you are describing sounds like the concept behind a Forney equalizer. Steve
glen herrmannsfeldt   wrote:

>It is usual for FPGAs to have flip-flops at the output of each >LUT, which makes systolic array pipelines real easy to build. >If you latch at each logic level, they are really fast, too!
Doesn't the usual definition of "systolic array" require that there is no globally synchrnous clock, that instead the clock follows the data around? And if so, would one really design with an FPGA in such a fashion? Steve
e precise). I'd never thought of it that way.
> >For a K-bit input, there are 2^(K*N) states where N is the FIR length. >Therefore a LUT with K*N address inputs and L output bits suffices to >describe the filter. > >When K and N are big (say, K = 16 and N = 32), it's impractical to use a >LUT. But for small values it becomes feasible; maybe even preferable... > >OK, I think I'm now sufficiently out of the rut of "everything is a >MAC"... >--
Is there any math / theorems that describe the behavior of a FIR filter as a state-machine (trellis?). Could there exist a direct relation between, say the state variable (2**(K*N)), and the output? If there was a direct relationship between the state-variable and the output a large state-variable and corresponding state-machine might be reasonable resource usage. I think from this perspective a traditional FIR maintains multiple state information and computes the output? chris
Eric Jacobsen  writes:

> On 3/27/2010 6:36 PM, Randy Yates wrote: >> Hi People, >> >> I'm trying to think of an efficient way to implement a Shaped OQPSK >> modulator in hardware (FPGA or CPLD). My first thought was to use an FIR >> filter. However, since there are only two input bits per symbol, there >> are only a relatively small number of output states and (it seems) a LUT >> could be much more efficient. >> >> Has anyone thought through this a little better and have any pointers >> or suggestions? > > A lut is a very common way to implement a modulator. You only need > enough address bits to cover N symbols plus however many phases you > want per symbol (four is often enough). You can even add address bits > for different pulse shapes or modulation orders. > > Even in an FPGA, if the symbol rate is fixed, this is often the most > efficient way to do it.
Right. The basic thing I was wrapping my brain around was the completely different viewpoint of an FIR as a finite state machine (a Mealy FSM, to be precise). I'd never thought of it that way. For a K-bit input, there are 2^(K*N) states where N is the FIR length. Therefore a LUT with K*N address inputs and L output bits suffices to describe the filter. When K and N are big (say, K = 16 and N = 32), it's impractical to use a LUT. But for small values it becomes feasible; maybe even preferable... OK, I think I'm now sufficiently out of the rut of "everything is a MAC"... -- Randy Yates % "Bird, on the wing, Digital Signal Labs % goes floating by mailto://yates@ieee.org % but there's a teardrop in his eye..." http://www.digitalsignallabs.com % 'One Summer Dream', *Face The Music*, ELO

Greg Berchin wrote:

> On Sun, 28 Mar 2010 06:59:26 -0500, Vladimir Vassilevsky > wrote: > > >>I cheer those who work on weekends. > > > What's a weekend?
A periodical thing when the other folks are trying to distract you from work. VLV