DSPRelated.com
Forums

suggestions for FIR hardware implementation?

Started by dtsao April 12, 2007
Hi,

does anyone have suggestions on efficient ways to implement an FIR filter
in verilog? My filter will need to be 256 taps, and run at 27 Mhz (so I
guess not that fast). I have already implemented an FIR using Distributed
Arithmetic, but would like to know any other methods you think might be
good. Thanks.

_____________________________________
Do you know a company who employs DSP engineers?  
Is it already listed at http://dsprelated.com/employers.php ?
dtsao wrote:

> does anyone have suggestions on efficient ways to implement an FIR filter > in verilog? My filter will need to be 256 taps, and run at 27 Mhz (so I > guess not that fast). I have already implemented an FIR using Distributed > Arithmetic, but would like to know any other methods you think might be > good. Thanks.
You minimize hardware by iterating a single multiplier through all the taps. You maximize speed by implementing each tap separately in hardware. (You can use a loop in verilog, it will expand out to separate hardware units.) What you want is somewhere in between. Figure out how fast you can run the tap hardware, divide by 27MHz, round down, and you know how many taps you can do with each processing unit. Maybe subtract one to be sure. Divide 256 by that number and round up for the number of processing units needed. -- glen
"dtsao" <tsaodave@yahoo.ca> wrote in message 
news:H_2dnagdL_ru7IPbnZ2dnUVZ_hCdnZ2d@giganews.com...
> > Hi, > > does anyone have suggestions on efficient ways to implement an FIR filter > in verilog? My filter will need to be 256 taps, and run at 27 Mhz (so I > guess not that fast). I have already implemented an FIR using Distributed > Arithmetic, but would like to know any other methods you think might be > good. Thanks.
I can give you some ideas but I'm not such a Verilog expert: Many FIR filters are symmetric. If multiplies are expensive relative to adds, then you might do this: - Delay and add symmetric pairs, then multiply. This has been done by others. Again, if multiplies are expensive relative to adds, then you might do this: - Round off the coefficients to sums of powers of two (SPT) and implement with shifts and adds. There's a fair amount of literature for getting *optimum* coefficients but I find that a spreadsheet works pretty well for deciding what coefficients to use. This has also been done by others. (Before doing this, scale the coefficients so that the largest coefficient(s) *are* a power of two so that, depending on the arithmetic, all you have to do is add the sample(s) for this one / these. Example: Original coefficient 0.7, use (1.0 - 0.25) or (0.5 + 0.25) to get an approximated coefficient of 0.75. If you want to get fancy using SPT, here is a suboptimum method using the Remez exchange algorithm (somewhat modified): Choose a maximum quantization level for the coefficients 1) Find an optimum filter using the algorithm. 2) Using the quantization level you decided on, search for that coefficient (or coefficient pair) which, when quantized all by itself, yields the highest error in the filter response. 3) Choose the direction of quantization for that one coefficient that yields the lowest error. 4) "Fix" that one coefficient at the quantized value. 5) Subtract the contribution of that one coefficient from the desired filter response - forming a new desired function. 6) Design an optimum filter, leaving out the basis function for the one coefficient that's been fixed. 7) Repeat the process at step 2, reducing the number of variable coefficients by one and increasing the number of fixed coefficients by one at each iteration. I keep meaning to implement this and test it. For now it remains an unproven idea. My experience with the algorithms suggests it will work well. The disadvantage is that you have to select the quantization level up front (although one could certainly play with ways to adjust that). The advantage is that the "optimization" will be much faster than searches for the optimum solution. For a length N symmetric filter the compute time would be comparable to designing N/2 filters of ever-decreasing length starting at length N. Why do we quantize the *worst* contributor to error first? Because, philosophically, if we're going to quantize the filter coefficients anyway then we may as well get the worst one out of the way first. Then, subsequent optimizations may compensate for it somewhat. (There could be a proof that this is nonsense - or not - or one or the other in specific cases). Fred