Tweets by @dsprelated

A Quadrature Signals Tutorial: Complex, But Not Complicated

Understanding the 'Phasing Method' of Single Sideband Demodulation

Complex Digital Signal Processing in Telecommunications

Introduction to Sound Processing

Introduction of C Programming for DSP Applications

Markus received his Dipl. Ing. degree in electrical engineering / communications in 1999. Work interests include RF transceiver system design, implementation, modeling an...show full bio

**Would you like to be notified by email when mnentwig publishes a new blog?**

Follow @DSPRelated

*An efficient implementation of a symmetric-FIR polyphase 1:3 interpolator that doesn't follow the usual tapped delay line-paradigm. The example exploits the impulse response symmetry and avoids four multiplications out of 10. keywords: symmetric polyphase FIR filter implementation ASIC*

Matlab / Octave implementation

An interpolating FIR filter can be implemented with a single tapped delay line, possibly going forwards and backwards for a symmetric impulse response. To optimize it further, I need to understand only a few things, namely "cascade" and "Nobel" identities, transposed filter structures and how to spell "polyphase decompostition".

Or maybe I forget everything about theory and just look at the implementation, which seems rather straightforward. Even obvious, after staring at it for an hour (do not blink).

Figure 1 shows a 1:3 interpolating FIR filter at a conceptual level.

After every input sample *x*, two zero samples are inserted.

z^{-1} refers to a delay at the high (output) rate.

The delay line contents in the middle of the picture illustrate, how the non-zero input samples and inserted zeros move through the delay line at three consecutive time instants.

For interpolation-by-three, the filter cycles through three 'phases', where different taps hold the non-zero samples.

Figure 1: FIR interpolation and polyphase equations

Clearly there is no point in performing expensive multiplications with (non)-samples that are known to be zero (*BUT: this is perfectly fine for Matlab coding, as long as it helps to keep things 'simple and stupid'. Usually it does*).

The equations at the bottom of Figure 1 show only remaining terms, if products with zeros are omitted.

The three output samples y_{1}, y_{2} and y_{3} are calculated for each input sample (the *x* before zero insertion), as the interpolating FIR produces three output samples per input sample.

Instead of inserting zeros and then carefully multiplying around them, the structure can be rewritten using "slower" delays z^{-3} at the input rate (=> *zzz* => the sample goes to sleep for a while. Ahem.).

Now three output samples are generated for each step of the delay line, as shown in Figure 2:

Figure 2: Conventional tapped delay-line implementation\

So far so good.

The implementation in Figure 2 is maybe the most common way to implement such a filter.

But, there is an alternative:

Figure 3: Alternative implementation

Figure 3 uses individual delay lines for each coefficient. The structure can always be implemented using one multiplication per independent coefficient and input sample.

The number of multiplications has been reduced from 10 to 6. And if the center tap can be fixed to 1, as is common, it improves from 9 to 5, which is even better.

This article shows a polyphase FIR interpolator for symmetric impulse responses that uses fewer multiplications than the textbook 'single-delay-line' implementation, in the example only two per output sample.

Does it make sense? Possibly "yes", justified by the application-dependent cost of multiplications.

In real life, "cost" is rarely as straightforward, but it may be worth having a look at.

For example, [1] (Fig 2.4) shows an application where a "non-canonical" polyphase structure can be beneficial.

Possible savings differ, depending on interpolation factor and filter length.

The Matlab / Octave example calculates the filter response using the three methods outlined in this article.

The resulting plot (Fig. 4) shows the output, three identical and thus overlapping traces.

Figure 4: Matlab output

[1] Improved multirate polyphase-based interpolation

Markus received his Dipl. Ing. degree in electrical engineering / communications in 1999. Work interests include RF transceiver system design, implementation, modeling and verification. He works as senior architect for Renesas Mobile Europe in Finland.

Previous post by Markus Nentwig:

Next post by Markus Nentwig:

Comments / Replies

kaz

Said:

Hi Markus,
For an interpolator (usually polyphase structure), the number of multipliers
= ceil(taps/I/folding factor).
Where I = interpolation factor and
folding factor is system floor(clock/sample rate). Thus in your case I expect 10/3/1 = 4
or even 3 if peak tap is power of 2.
With the polyphase structure, we only need one delay line equal to taps/I and the symmetry is lost but it stays as most efficient approach.
Kadhiem

2 years ago

0

Sorry, you need javascript enabled to post any comments.

mnentwig

Replied:

Hello,
thanks for your comment. Now I'm not sure if I understand it correctly. Please clarify, if I got it wrong:
A conventional polyphase would use four multiplications per -output- sample, which is (interpolating by 3) twelve multiplications per input sample.
As my example impulse response is a bit shorter, eleven remain, and one is duplicate => ten multiplications per input sample.
My proposal uses 6 multiplications per -input- sample. As I get three output samples, I need only two multiplications per -output- sample, instead of four.
-markus

kaz

Replied:

That seems ok. The point you are raising is multiplications per output (or input). The point I am raising is multipliers per output(or input). It depends on perspective. Mine is FPGA and we focus on the latter. May you are concerned about speed or power consumtion.
Thanks. By the way, your index of coeffs is not right (possibly typing error) in Fig 2
Kadhiem

mnentwig

Replied:

Hello,
the whole exercise works only if the impulse response is symmetric, because I exploit c7 = c5 etc.
I'll point out the need for symmetric IR more clearly in the introduction.
Fortunately, I'd expect that the typical polyphase FIR impulse response is symmetric: If I need asymmetry to equalize group delay, I could do that more efficiently at the low rate.

kaz

Replied:

Hi Markus,
Just to confirm what I said about FPGA perspective. Given the choice of 4 multipliers for entire fir design or 6 multipliers, any fpga designer will certainly choose 4. The 3 outputs will take turns to use same 4 multipliers.
It seems that the perspective of dsp (software as opposed to fpga dsp) is different.
I can imagine that in terms of power consumption, yes 6 multiplications per input may be favoured over 4x3 per input.
Kadhiem

mnentwig

Replied:

I think most DSPs actually handle a multiplication in one clock cycle. If so, I'd expect that multiplications don't matter at all.
But on an ASIC; for example, an algorithm that uses fewer multiplications for a given processing job can save quite a bit of silicon area, as that's where the money goes.

Sorry, you need javascript enabled to post any comments.