comp.dsp | suggestions for FIR hardware implementation?

Hi,

does anyone have suggestions on efficient ways to implement an FIR filter
in verilog? My filter will need to be 256 taps, and run at 27 Mhz (so I
guess not that fast). I have already implemented an FIR using Distributed
Arithmetic, but would like to know any other methods you think might be
good. Thanks.

_____________________________________
Do you know a company who employs DSP engineers?  
Is it already listed at http://dsprelated.com/employers.php ?

dtsao wrote:

> does anyone have suggestions on efficient ways to implement an FIR filter
> in verilog? My filter will need to be 256 taps, and run at 27 Mhz (so I
> guess not that fast). I have already implemented an FIR using Distributed
> Arithmetic, but would like to know any other methods you think might be
> good. Thanks.

You minimize hardware by iterating a single multiplier through all
the taps.  You maximize speed by implementing each tap separately
in hardware.  (You can use a loop in verilog, it will expand out
to separate hardware units.)  What you want is somewhere in between.

Figure out how fast you can run the tap hardware, divide by 27MHz,
round down, and you know how many taps you can do with each processing
unit.  Maybe subtract one to be sure.  Divide 256 by that number
and round up for the number of processing units needed.

-- glen

"dtsao" <tsaodave@yahoo.ca> wrote in message 
news:H_2dnagdL_ru7IPbnZ2dnUVZ_hCdnZ2d@giganews.com...
>
> Hi,
>
> does anyone have suggestions on efficient ways to implement an FIR filter
> in verilog? My filter will need to be 256 taps, and run at 27 Mhz (so I
> guess not that fast). I have already implemented an FIR using Distributed
> Arithmetic, but would like to know any other methods you think might be
> good. Thanks.

I can give you some ideas but I'm not such a Verilog expert:

Many FIR filters are symmetric.  If multiplies are expensive relative to 
adds, then you might do this:

- Delay and add symmetric pairs, then multiply.  This has been done by 
others.

Again, if multiplies are expensive relative to adds, then you might do this:

- Round off the coefficients to sums of powers of two (SPT) and implement 
with shifts and adds.  There's a fair amount of literature for getting 
*optimum* coefficients but I find that a spreadsheet works pretty well for 
deciding what coefficients to use.  This has also been done by others.
(Before doing this, scale the coefficients so that the largest 
coefficient(s) *are* a power of two so that, depending on the arithmetic, 
all you have to do is add the sample(s) for this one / these.
Example: Original coefficient 0.7, use (1.0 - 0.25) or (0.5 + 0.25) to get 
an approximated coefficient of 0.75.

If you want to get fancy using SPT, here is a suboptimum method using the 
Remez exchange algorithm (somewhat modified):

Choose a maximum quantization level for the coefficients
1) Find an optimum filter using the algorithm.
2) Using the quantization level you decided on, search for that coefficient 
(or coefficient pair) which, when quantized all by itself, yields the 
highest error in the filter response.
3) Choose the direction of quantization for that one coefficient that yields 
the lowest error.
4) "Fix" that one coefficient at the quantized value.
5) Subtract the contribution of that one coefficient from the desired filter 
response - forming a new desired function.
6) Design an optimum filter, leaving out the basis function for the one 
coefficient that's been fixed.
7) Repeat the process at step 2, reducing the number of variable 
coefficients by one and increasing the number of fixed coefficients by one 
at each iteration.

I keep meaning to implement this and test it.  For now it remains an 
unproven idea.  My experience with the algorithms suggests it will work 
well.  The disadvantage is that you have to select the quantization level up 
front (although one could certainly play with ways to adjust that).  The 
advantage is that the "optimization" will be much faster than searches for 
the optimum solution.  For a length N symmetric filter the compute time 
would be comparable to designing N/2 filters of ever-decreasing length 
starting at length N.

Why do we quantize the *worst* contributor to error first?
Because, philosophically, if we're going to quantize the filter coefficients 
anyway then we may as well get the worst one out of the way first.  Then, 
subsequent optimizations may compensate for it somewhat.  (There could be a 
proof that this is nonsense - or not - or one or the other in specific 
cases).

Fred

suggestions for FIR hardware implementation?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group