Sign in

Not a member? | Forgot your Password?

Search blogs

Search tips

Free PDF Downloads

A Quadrature Signals Tutorial: Complex, But Not Complicated

Understanding the 'Phasing Method' of Single Sideband Demodulation

Complex Digital Signal Processing in Telecommunications

Introduction to Sound Processing

C++ Tutorial

Introduction of C Programming for DSP Applications

Fixed-Point Arithmetic: An Introduction

Cascaded Integrator-Comb (CIC) Filter Introduction

Articles by category

FFT Spectral Analysis Software

See Also

Embedded SystemsFPGA

Blogs > Markus Nentwig > FIR sideways (interpolator polyphase decomposition)

Markus Nentwig (contact)
Markus received his Dipl. Ing. degree in electrical engineering / communications in 1999. Work interests include RF transceiver system design, implementation, modeling an...show full bio

Would you like to be notified by email when mnentwig publishes a new blog?

  




Pageviews: 899

FIR sideways (interpolator polyphase decomposition)

Posted by Markus Nentwig on Sep 12 2012 under Matlab | Basics | Multirate DSP   

An efficient implementation of a symmetric-FIR polyphase 1:3 interpolator that doesn't follow the usual tapped delay line-paradigm.
The example exploits the impulse response symmetry and avoids four multiplications out of 10.
keywords: symmetric polyphase FIR filter implementation ASIC

Matlab / Octave implementation

Introduction

An interpolating FIR filter can be implemented with a single tapped delay line, possibly going forwards and backwards for a symmetric impulse response. To optimize it further, I need to understand only a few things, namely "cascade" and "Nobel" identities, transposed filter structures and how to spell "polyphase decompostition".
Or maybe I forget everything about theory and just look at the implementation, which seems rather straightforward. Even obvious, after staring at it for an hour (do not blink).

(Non)-theory

Figure 1 shows a 1:3 interpolating FIR filter at a conceptual level.
After every input sample x, two zero samples are inserted.
z-1 refers to a delay at the high (output) rate.
The delay line contents in the middle of the picture illustrate, how the non-zero input samples and inserted zeros move through the delay line at three consecutive time instants.
For interpolation-by-three, the filter cycles through three 'phases', where different taps hold the non-zero samples.


Figure 1: FIR interpolation and polyphase equations

Clearly there is no point in performing expensive multiplications with (non)-samples that are known to be zero (BUT: this is perfectly fine for Matlab coding, as long as it helps to keep things 'simple and stupid'. Usually it does).
The equations at the bottom of Figure 1 show only remaining terms, if products with zeros are omitted.
The three output samples y1, y2 and y3 are calculated for each input sample (the x before zero insertion), as the interpolating FIR produces three output samples per input sample.

Textbook polyphase FIR interpolator

Instead of inserting zeros and then carefully multiplying around them, the structure can be rewritten using "slower" delays z-3 at the input rate (=> zzz => the sample goes to sleep for a while. Ahem.).
Now three output samples are generated for each step of the delay line, as shown in Figure 2:


Figure 2: Conventional tapped delay-line implementation\

'FIR sideways':An alternative implementation

So far so good.
The implementation in Figure 2 is maybe the most common way to implement such a filter.
But, there is an alternative:


Figure 3: Alternative implementation

Figure 3 uses individual delay lines for each coefficient. The structure can always be implemented using one multiplication per independent coefficient and input sample.
The number of multiplications has been reduced from 10 to 6. And if the center tap can be fixed to 1, as is common, it improves from 9 to 5, which is even better.

Conclusion

This article shows a polyphase FIR interpolator for symmetric impulse responses that uses fewer multiplications than the textbook 'single-delay-line' implementation, in the example only two per output sample.
Does it make sense? Possibly "yes", justified by the application-dependent cost of multiplications.
In real life, "cost" is rarely as straightforward, but it may be worth having a look at.
For example, [1] (Fig 2.4) shows an application where a "non-canonical" polyphase structure can be beneficial.
Possible savings differ, depending on interpolation factor and filter length.

Matlab demo

The Matlab / Octave example calculates the filter response using the three methods outlined in this article.
The resulting plot (Fig. 4) shows the output, three identical and thus overlapping traces.


Figure 4: Matlab output

References

[1] Improved multirate polyphase-based interpolation



Rate this article:
5
Rating: 5 | Votes: 1
 
   
 
posted by Markus Nentwig
Markus received his Dipl. Ing. degree in electrical engineering / communications in 1999. Work interests include RF transceiver system design, implementation, modeling and verification. He works as senior architect for Renesas Mobile Europe in Finland.

Previous post by Markus Nentwig: Design of an anti-aliasing filter for a DAC
Next post by Markus Nentwig: 'z' as in 'Zorro': Frequency Masking FIR
all articles by Markus Nentwig

Comments / Replies


kaz
Said:
Hi Markus, For an interpolator (usually polyphase structure), the number of multipliers = ceil(taps/I/folding factor). Where I = interpolation factor and folding factor is system floor(clock/sample rate). Thus in your case I expect 10/3/1 = 4 or even 3 if peak tap is power of 2. With the polyphase structure, we only need one delay line equal to taps/I and the symmetry is lost but it stays as most efficient approach. Kadhiem
2 years ago
0
Reply
Sorry, you need javascript enabled to post any comments.
mnentwig
Replied:
Hello, thanks for your comment. Now I'm not sure if I understand it correctly. Please clarify, if I got it wrong: A conventional polyphase would use four multiplications per -output- sample, which is (interpolating by 3) twelve multiplications per input sample. As my example impulse response is a bit shorter, eleven remain, and one is duplicate => ten multiplications per input sample. My proposal uses 6 multiplications per -input- sample. As I get three output samples, I need only two multiplications per -output- sample, instead of four. -markus
2 years ago
0
kaz
Replied:
That seems ok. The point you are raising is multiplications per output (or input). The point I am raising is multipliers per output(or input). It depends on perspective. Mine is FPGA and we focus on the latter. May you are concerned about speed or power consumtion. Thanks. By the way, your index of coeffs is not right (possibly typing error) in Fig 2 Kadhiem
2 years ago
0
mnentwig
Replied:
Hello, in the end, it boils down to how many multiplications per second I need for a given signal. Compared to a "canonical" implementation, the above proposal should need about half.
2 years ago
0
mnentwig
Replied:
BTW, where is the mistake in Fig. 2? I can't see it, but typos are usually good at "not being seen"...
2 years ago
0
kaz
Replied:
Oops it is the way you indexed for symmetry, a bit confusing, I would normally use h1 ~ h11 rather than repeat the second half with same index of first half. Sorry.
2 years ago
0
mnentwig
Replied:
Hello, the whole exercise works only if the impulse response is symmetric, because I exploit c7 = c5 etc. I'll point out the need for symmetric IR more clearly in the introduction. Fortunately, I'd expect that the typical polyphase FIR impulse response is symmetric: If I need asymmetry to equalize group delay, I could do that more efficiently at the low rate.
2 years ago
0
kaz
Replied:
Hi Markus, Just to confirm what I said about FPGA perspective. Given the choice of 4 multipliers for entire fir design or 6 multipliers, any fpga designer will certainly choose 4. The 3 outputs will take turns to use same 4 multipliers. It seems that the perspective of dsp (software as opposed to fpga dsp) is different. I can imagine that in terms of power consumption, yes 6 multiplications per input may be favoured over 4x3 per input. Kadhiem
2 years ago
0
mnentwig
Replied:
I think most DSPs actually handle a multiplication in one clock cycle. If so, I'd expect that multiplications don't matter at all. But on an ASIC; for example, an algorithm that uses fewer multiplications for a given processing job can save quite a bit of silicon area, as that's where the money goes.
2 years ago
0
Sorry, you need javascript enabled to post any comments.