Optimizing the Half-band Filters in Multistage Decimation and Interpolation
This blog discusses a not so well-known rule regarding the filtering in multistage decimation and interpolation by an integer power of two. I'm referring to sample rate change systems using half-band lowpass filters (LPFs) as shown in Figure 1. Here's the story.
Figure 1: Multistage decimation and interpolation using
Multistage Decimation – A Very Brief Review
Figure 2(a) depicts the process of decimation by an integer factor D. That is, lowpass FIR (linear-phase) filtering followed by downsampling. The downsampling operation '↓D' means discard all but every Dth input sample.
When the desired decimation factor D is large, say D > 10, a large number of multipliers is necessary within the tapped-delay line of lowpass filter LPF0. Early DSP pioneers, upon whose shoulders we stand, determined that a more computationally efficient scheme uses multiple decimation stages as shown in Figure 2(b). There the product of integers D1 and D2 is our desired overall decimation factor, that is, D1D2 = D. The point is, in Figure 2 the sum of the number of multipliers in filters LPF1 and LPF2 will be significantly less than the number of multipliers in LPF0.
Figure 2: Decimation by integer factor D: (a) single-stage;
And, happily, in real-time operations the LPF2 filter operates at a lower clock rate than the LPF0 filter. So for decimation by D = 10, the smart thing to do is use the Figure 2(b) implementation where D1 = 2 and D2 = 5. Let's now look at multistage decimation by an integer power of two.
Multistage Decimation by an Integer Power of Two
The first focus of this blog is multistage decimation by an integer power of two, such as the decimation by D = 8 process shown in Figure 3. The hn[k] variables in that figure represent the coefficients of the lowpass FIR filters.
Figure 3: Three-stage decimation by D = 8 using half-band
lowpass filters. Filter impulse responses are hn[k].
Multistage decimation by an integer power of two is super efficient because all the filters are half-band lowpass filters, where roughly every other filter coefficient is zero-valued. (For example, a 31-tap half-band filter will have only 17 nonzero-valued coefficients).
Years ago when I first saw Figure 3 I thought, "Neat. All I have to do is pass my input signal through a cascade of three identical 'half-band filter and downsample by two' stages." That is, it seemed convenient to use the same half-band filter for each of the three stages. As was pointed out to me, that would be a mistake! [1,2]
Indeed, the most efficient implementation of
Figure 3 is to use three different half-band filters.
As it turns out, the number of h1[k] coefficients can be less than the number of h2[k] coefficients. And the number of h2[k] coefficients can be less than the number of h3[k] coefficients.
We prove this important principle by way of example starting with Figure 4. That figure shows the spectra of the outputs of the initial operations in Figure 3 when fs,in = 8 kHz. Figure 4(a) shows the spectrum of our x[n] input signal and the dashed lines in Figure 4(b) show the frequency magnitude response, |H1(f)|, of our first half-band filter. Figure 4(c) shows that filter's |A(f)| output spectrum.
Figure 4: Initial Figure 3 decimation by D = 8 processing:
Figure 4(d) shows that after our first downsample by two operation aliasing does occur. However, that aliased spectral energy (which does not contaminate the spectrum of our dark-shaded signal of interest) will be eliminated in the follow on filtering. Figure 5 shows the spectra of the outputs of the remaining operations in Figure 3.
Figure 5: Final Figure 3 decimation by D = 8 processing:
We see in Figures 3 and 4 that the normalized transition region widths, the dashed lines of |H1(f)|, |H2(f)|, and |H3(f)|, of the three half-band filters are different. The transition region widths of the h1[k] and h2[k] filters need not be as narrow as the transition region width of the h3[k] filter. And if the numbers of multipliers in the Figure 3 filters are N1, N2, and N3, we can say:
For decimation: N1 < N2 < N3.
Thus in our relentless pursuit of computational efficiency, for optimum multistage decimation those three half-band filters will each have a different number of multipliers. That situation is the main point I wanted to make in this blog. Let's conclude this blog by taking a quick look at multistage interpolation by an integer power of two.
Multistage Interpolation by an Integer Power of Two
Figure 6 depicts multistage interpolation by the factor I = 8. An upsampling operation '↑2' means insert one zero-valued sample between each of the upsampler's input samples. The lowpass filters in Figure 6 are linear-phase, symmetrical-coefficient, half-band filters.
Figure 6: Three-stage interpolation by I = 8 using
half-band lowpass filters
Similar to the multistage decimation in Figure 3, the most efficient implementation of Figure 6's multistage interpolation is to use three different half-band filters. That is, in Figure 6 the number of h3[k] coefficients can be less than the number of h2[k] coefficients. And the number of h2[k] coefficients can be less than the number of h1[k] coefficients.
We show this situation by way of example starting with Figure 7. That figure shows the spectra of the outputs of the initial operations in Figure 6 when fs,in = 1000 Hz. Figure 7(a) shows the spectrum of our x[n] input signal. Figure 7(b) shows the spectrum of Figure 6's upsampled by 2 a[n] sequence. The shaded spectral components in Figure 7(b) are the spectral images generated by our first upsampling by two operation. An explanation of why those images exist is given in reference .
The dashed lines in Figure 7(b) show the frequency magnitude response, |H1(f)|, of our first half-band filter whose function is to eliminate the unwanted shaded spectral images in |A(f)|.
Figure 7: Initial three-stage interpolation by I = 8 processing:
Figure 7(c) shows that filter's |A(f)| output spectrum. Figure 8 shows the output spectra of the four remaining operations in Figure 6.
Figure 8: Final three-stage interpolation by I = 8 processing:
By careful examination of the frequency axis values, we can see in Figures 7 and 8 that the normalized transition region widths, the dashed lines of |H1(f)|, |H2(f)|, and |H3(f)|, of the three half-band filters are different. The transition region widths of the h2[k] and h3[k] filters need not be as narrow as the transition region width of the h1[k] filter. Thus if the numbers of multipliers in the Figure 6 filters are N1, N2, and N3, we can say:
For interpolation: N1 > N2 > N3.
I have shown that multistage sample rate change by an integer power of two is most computationally efficient when the half-band filters each have a different number of multipliers. To quote Forrest Gump, "And that's all I have to say about that."
 Evgeny Filatov. Private communication. Dec. 20, 2014.
 Wolfram Humann. Private communication. Dec. 18, 2015.
Just to add the perspective of FPGA. Here we are also concerned about the design effort (for manual coding at least) and associated bugs and testing. Thus a single stage FIR may also have to be considered and may be preferred over a cascade of filters. In some other cases a multistage offers additionally support for more than one configuration of rate change.
For CDMA/LTE Tx/Rx chains we do cascading (3 to 4 stages). I know it turns out better but I don't have a clear explanation as exactly why and is it at expense of design time.
Thanks for your comment.
In your 2nd paragraph you wrote, "I know it turns out better but I don't have a clear explanation as exactly why." Are you saying that you don't know why, for example, multistage decimation is more computationally efficient than single-stage decimation?
You're right. Most descriptions of how multistage decimation is more computationally efficient than single-stage decimation are by way of examples. I don't have an algebraic proof of this principle. But the paper: "Interpolation and Decimation of Digital Signals-A Tutorial Review" by Crochiere and Rabiner may have the mathematical proof you desire. (I don't have a web page where you can access that paper.) Another paper that might be useful to you, by the same authors, is at:
If you look at each step, each is totally symmetrical to the rest, so the filtering should be identical. In the case you present, the problem is unsymmetrical as the ratio of signal bandwidth to sample rate changes for each (by the decimation factor).
Quiz question: is there a computationally more efficient way to split signal power into 2 bands (0 to nyquist rate/2 and nyquist rate/2 to nyquist rate, than using 2 separate filters? What about nyquist rate/4 or some other ratio (is there a "magic ratio" where many of the FIR terms become zero?)
With passband and stopband design constraints it may not always be possible to reduce the filter size at each decimation stage.
Could you please comment on what happens when a signal length is shorter than a filter length. Can I still apply normal filter analysis in this situation? How is the signal going to be affected (or distorted?)
Joy, please forgive me. I dont have an answer to your short-length input sequences question off the top of my head right now, but I am going to study the situation youre asking about. Ill either answer you in a future Reply here or if I learn something of interest Ill write a new blog addressing your question.
Thank you so much for your reply. Sorry, I think I didnt describe my question properly. What if I have a finite duration sine wave (i.e. a pulse signal of M samples) corrupted by noise. The time of arrival, signal amplitude, phase, frequency and bandwidth are all unknown. I would like to use FIR filter to reduce noise bandwidth to improve probability of detection. In this scenario, what happens when the FIR filter length is longer than the signal of interest (M), but shorter than the overall number of samples available. I know that the filtered pulse signal will be distorted and the characteristics of the signal will be changed. But what I am interested in is how the signal energy vs. noise energy affected?
When people do analysis on window function for DFT, we always talk about processing gain (processing loss to be more precise), because of the signal energy reduction introduced by the window function. But how come when we do filter analysis, we are not thinking in this way anymore? What is the difference? Can I consider my filter coefficients as a window function and start calculating signal energy that way?
Thank you for your time.
I read your first paragraph several times but, sadly, I was unable to understand it. You wrote that you have a sine wave sequence represented by M samples. OK so far. And you said the number of FIR filter coefficients is greater than M. Still OK so far. But then you wrote the number of FIR filter coefficients is "shorter than the overall number of samples available." Joy, what is this mysterious thing you call "number of samples available"? How is this undefined thing called "number of samples available" related to the M-length sine wave sequence?
I'm not sure I'm able to answer the questions in your second paragraph in a meaningful way. Let me say this: If you multiply an N-length x(n) signal sequence by an N-length w(n) window sequence you produce an N-length product sequence p(n), where p(n) = w(n)*x(n). If you multiply an N-length x(n) signal sequence by n N-length h(n) filter coefficient sequence you produce an N-length product sequence q(n), where q(n) = h(n)*x(n). But in FIR filtering you sum the N q(n) samples to produce a *SINGLE* sample that I'll call y(n). So p(n) has N samples and y(n) is a single sample. So all I'm saying is that a single windowing operation is very different than computing a single y(n) filter output sample. I'll bet you already know all about this.
In your second paragraph I wonder if you're thing about "FIR filter transient" behavior. If a convolutional (tapped-delay line) FIR filter has C coefficients, when an S-length input sequence is applied the filter's convolution output sequence length will be:
FIR filter output sequence length = C + S -1 samples.
If signal length S = 20 and the filter has C = 7 coefficients, then
FIR filter output sequence length = 7 + 20 -1 = 26 samples.
But because FIR filters have a "start" transient response (as the filter's delay line begins to fill with inputs samples) and an "end" transient response (as the input samples empty out of the filter's delay line) our filter's valid output sample sequence length will less than C + S -1 = 26 samples. The "start" and "end" transients are C-1 samples in length, and the number of valid filter output samples is:
Valid output sequence length = (C + S -1) -2(C-1) = 14 samples.
My phrase "valid output samples" means output sample values that correctly represent the portion of the FIR filter's input that resided within the filter's passband. Let's say an S = 20 input signal contained spectral energy at f1 Hz plus spectral energy at f2 Hz, and f1 was within the unity-gain passband of the filter and f2 was in the filter's stopband. Due to filter transient behavior the energy of the 14 "valid" f1 Hz filter output samples would be less than the energy of the 20 f1 Hz portion of the filter input samples.
But if S = 50,000 in the above scenario, the energy of the 50,000 -12 = 49,988 "valid" f1 Hz filter output samples will be practically equal to the energy of the 50,000 f1 Hz portion of the filter input samples. Joy, I have no idea if what I've written here is of any help to you.
Thank you very much for your reply and sorry for not making myself very clear.
I understand what you just said. I guess it probably doesnt make sense to filter a short signal with a long filter. This was your conclusion from previous post. I was just trying to understand what happens if I do filter my short signal with a long filter. For example, if S = 7 can C= 20, I still get 7+20-1 samples out of the filter, even they dont make sense fundamentally and they are just part of the 2(C-1) = 38 filter transient response. I was told to simply ignore those 38 samples during my university education. But now I have a system which does exactly that and forcing me to interrupt those data. (The signal duration is unknown to the system).
Perhaps, I need to rethink my problem. Thank you for your time.
Love your Understanding Digital Signal Processing book by the way. Its awesome!
The good news is that it's fairly easy for you to model, in software, the various filtering scenarios you're thinking about to see exactly how the output sequences of FIR filters are related to their input sequences. If you have further questions I suggest you think about posting them on the dsprelated.com Forum here. That way you'll receive multiple answers from various people, which is usually a good thing.
Joy, if you have a hardcopy of my "Understanding DSP"
book I can send you the appropriate errata for your
copy of my book if you wish. But you must tell me two
things: (1) the Edition number (1st, 2nd, or 3rd)
and (2) the "Printing Number" of your copy of my book.
You can determine the "Printing Number"
of my book by looking at the page just before the "Dedication"
page. On that page (before the Dedication) you'll see all
sorts of publisher-related information, including the
ISBN-10 number. At the very bottom of the page you
should see lines printed something like:
Text printed in the United States on ...
indicating the "First Printing" of the book. However, for
later printings the second line above may have words
such as: "Second Printing" or maybe "Fourth Printing".
My e-mail address is: R dot Lyons at ieee dot org.
Thank you for another excellent post.
The earlier(/latter) stages in a cascade of decimating(/interpolating) halfband filters can be made shorter. I think an interesting question is: "by how much?". Do you know any rules of thumb?
Looking at the paper you linked in your comment above, I think the answer may lie in their Equation (20) for decimators and Equation (25) for interpolators. However, it will take me a while to decipher their notation and perhaps eventually extract a simplified expressions for cascades of halfband decimators/interpolators. I hope I can find time to do this, because I am currently confined to case-by-case simulations, which is not a very sophisticated way of doing things.
To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.
Registering will allow you to participate to the forums on ALL the related sites and give you access to all pdf downloads.