Scaling a Network's Gain: A Note For DSP Beginners

Rick LyonsMarch 29, 201917 comments

This blog briefly discusses a topic well-known to experienced DSP practitioners but may not be so well-known to DSP beginners. The topic is the proper way to scale a digital network in order to reduce the network's gain.

Digital Network Scaling
Figure 1 shows a collection of networks I've seen, in the literature of DSP, where scaling is implemented.

                    FIGURE 1. Examples of scaled digital networks.

Focusing on the network in Figure 1(a), I encountered that lowpass filter block diagram while studying the subject of frequency sampling filters described in a popular college DSP textbook. Such a filter would normally have a passband gain of 8, and in order to force the filter's passband gain to be equal to one the authors included the 1/8 scaling multiplication operation at the input of the filter.

Assuming Figure 1(a)'s x(n) input was generated by an A/D converter and we're using fixed-point arithmetic, such a scaling method would discard the three least significant bits of the x(n) input sequence. Such scaling decreases the resulting input's signal-to-quantization-noise ratio (SQNR) by an unacceptable 18 dB.

Input signal bits are valuable. Don't throw them away! For proper implementation, the preferred scaling method for Figure 1(a)'s filter is to place the 1/8 scaling at the output as shown in Figure 2(a). And that guidance is main point of this blog; scale a network's output sequence, not its input sequence. The other networks in Figure 2 illustrate properly implemented scaling.                   

                    FIGURE 2. Properly scaled digital networks.

A Warning
It was some days ago when I encountered the Figure 1(c) Simpson's 1/3 Rule digital integrator (repeated here as Figure 3,) in a book discussing biomedical signal processing. It was this network that prompted me to write this blog.

         FIGURE 3. Incorrect implementation of a Simpson's 1/3 Rule integrator.

Figure 3 is an attempt to implement a Simpson's 1/3 Rule integrator transfer function defined by the following:
    $$y(n) = \frac{[x(n)+4x(n-1)+x(n-2)]}{3}+y(n-2)$$
My warning is: the Figure 3 network does not compute correct Simpson's 1/3 Rule integration results. I haven't yet created a correct block diagram for a Simpson's 1/3 Rule integrator but if I do it will not have a 1/3 multiplier at its input.


Previous post by Rick Lyons:
   Stereophonic Amplitude-Panning: A Derivation of the 'Tangent Law'


[ - ]
Comment by kazMarch 29, 2019

Hi Rick,

I am unable to understand what the problem is in the first place.

The fraction of alpha and (1-alpha) are written on diagrams to describe the system. In reality the implementation depends on platform. For fixed point(ASIC/FPGA) if alpha = 0.013 as example then we scale it say to 0.013 *2^15 and same for (1-alpha). and finally we scale down the result by 2^15 (discard 15 lsbs).

For floating point there is no worry of losing bits.

[ - ]
Comment by Rick LyonsMarch 29, 2019

Hi Kaz.

The problem is the degradation suffered by an input signal's 'Signal to Quantization Noise Ratio' when the very first operation of a digital system is attenuation.

Kaz, please forgive me. I'm not familiar with the peculiarities of ASIC/FPGA implementations. I don't understand what you meant when you wrote, "For fixed point(ASIC/FPGA) if alpha = 0.013 as example then we scale it say to 0.013 *2^15 and same for (1-alpha)." Are you saying: In FPGAs, if we want to multiple a discrete sample by decimal 0.013 then we would actually multiply that sample by decimal 0.013*2^15 = 425.984?

[ - ]
Comment by kazMarch 29, 2019

Hi Rick,

Yes we convert the fraction of 0.013 to 426 (rounded) using scale factor of 2^15. This scale factor is up to designer. If you want mathematical unity then you divide back by 2^15 at some final output allowing full bit growth inside a given module. This also applies to any multiplication including products of filters as each coeff is scaled internally, multiplied, summed given full bit growth, finally truncated by discarding lsbs plus any rounding.

This power of 2 approach avoids using dividers which are slow and resource demanding. However I have seen many beginners learn the wrong way and use dividers.

In all cases we scale fractions to integers though some tools prefer to use fractional notation as if software, yet internally it is just a data bus with imagined decimal point(in above case the decimal point could be at bit 15.

In the same way we can divide by a constant by converting the operation to pre-scale then multiply then divide in hardware through truncation

I am curious why you focussed (rightly) on scaling of input with alpha but not y with (1-alpha) which is also fraction. 

[ - ]
Comment by Rick LyonsMarch 30, 2019

Hi kaz.

You wrote, "I am curious why you focussed (rightly) on scaling of input with alpha but not y with (1-alpha) which is also fraction." The point of my blog is nothing more than the following: Scaling a DSP system's gain by attenuating the system's input signal, prior to performing any follow-on computations, should be avoided if possible.

[ - ]
Comment by kazMarch 30, 2019

Hi Rick,

The principle itself is sound advice. But the example given is open to different views. 

My view is that I will keep integrator feedback gain under control by applying traditional alpha and 1-alpha inside feedback loop then scale down the output outside loop. Otherwise it could shoot up at very low values of alpha.

[ - ]
Comment by weetabixharryMarch 30, 2019

Hi Rick,

I think your post demonstrates very neatly how two seemingly equivalent theoretical designs could give rise to vastly different performance in the implementation.

As you note, this kind of consideration is very well known and any implementation engineer worth their salt would weigh up the trade-offs. The reason I say it's a trade-off is that - for the cost of the 18dB degradation you mention - you have bought cheaper adders, multipliers and delay elements through the rest of the network. (It's always cheaper to add, multiply and delay narrower bit-widths in ASICs, and often in FPGAs too, with some caveats).

As you say, 18dB may be an unacceptable degradation in performance. But if you are also constrained by area and/or power requirements, then you may have to do some scaling at the start, some in the middle and some at the end.

But as an introductory article, I think your blog highlighted the key idea very concisely and was a joy to read.


[ - ]
Comment by Rick LyonsMarch 30, 2019

Hi weetabixharry. Thanks for the compliment, but I must confess. The Exponential Averager in my Figure 1(b) comes straight out of my DSP book's Chapter 11. Years ago when I first studied exponential averagers I naively repeated the literature's typical block diagram as the averager's block diagram for my book. At that time I was thinking analytically rather than practically. 

[ - ]
Comment by niarnApril 1, 2019

Hi Rick, I believe that in many fixed-point systems the input gain does not throw away bits in the way you describe because the signal path is running with more fractional bits that what is coming from the ADC. Or if this is not the case then an input gain may be required to avoid wrap-around.

[ - ]
Comment by Rick LyonsApril 1, 2019

Hi niarn.

Is there a web page were I can see an example of what you call "running with more fractional bits that what is coming from the ADC"? I'm not sure what your word "running" means.

[ - ]
Comment by weetabixharryApril 1, 2019

Any division by $2^n$ can be considered implicit in a fixed point system. In other words, you do nothing but just remember that the meaning of the number has changed (i.e. has more fractional bits).

For example, let's say the input is 100.75, represented as a 16-bit two's compliment signed number with 4 fractional bits:

0000011001001100 (64 + 32 + 4 + 0.5 + 0.25)

We could even insert a binary point if we want:


Then if we divide this number by 8, we get:


or with the binary point:


The bits are exactly the same, but now the meaning is 8+4+0.5+1/16+1/32 = 12.59375.

If we divide by something other than $2^n$, we can still increase or decrease the number of overall bits after the division. For example, if we divide 100.75 by 3 instead of 8, then we will get:

000000100001.10010101010101010101010101010... = 32+1+1/2+1/16+1/64+...

It's up to us how many repetitions of "10" we want to keep at the end.

[ - ]
Comment by Rick LyonsApril 2, 2019
Hi weetabixharry. Is your above example (where division by 8 merely moves a number's binary point to the left three bits) the kind of arithmetic processing that can take place in an FPGA?
[ - ]
Comment by weetabixharryApril 2, 2019

Hi Rick,

In my experience, this concept is extremely widely used in FPGAs and ASICs. For a fixed (i.e. fixed at compile time) multiplication or division by $2^n$, the cost in terms of hardware resources (and power consumption) is exactly zero because the position of the binary point is not represented in any way in hardware. It’s something the designer has to keep track of and manage appropriately. This can become surprisingly difficult after even just a few stages of processing (where each stage may change both the physical bit-width and the conceptual position of the binary point).

If your $2^n$ is not fixed at compile time, then the cost could still be very small, but not zero. For example, if you’re not sure if you will need to divide by 8 or 16, then – abstractly speaking - you need some way of representing in the hardware which route was taken. The naïve way would be to literally “compute” two separate right-shifts: one right-shift by 3 bits (cost = zero), one right-shift by 4 bits (cost = zero), and then select the one you want using a multiplexer (cost > zero).

That approach is naïve because it blindly assumes that you want your physical bit-width to stay the same and that you want the conceptual binary point to remain in the same place. In some cases, that might be the best design choice, but not in general.

In some cases, it might be better to do no explicit shifting and instead just keep record (in hardware, alongside your original number) of all the implicit left and right shifts that accumulate over time. This is essentially what floating point representation is for (the mantissa records the number and the exponent records the bit-shift). Unfortunately, arbitrary floating-point arithmetic is significantly more expensive than fixed point (addition, for example, costs vastly more). So, again, it’s up to the designer to use it in the right places.

[ - ]
Comment by Rick LyonsApril 2, 2019
Hi weetabixharry. Thanks for your detailed April 2nd post. I've never worked with FPGAs---they appear to have an entirely new level of complexity relative the more traditional microcontroller and DSP chips. I'm an ol' burned-out retiree now, but if I was a young guy I would immediately start learning everything I could about FPGAs.
[ - ]
Comment by kazApril 3, 2019

Hi Rick,

As words of "comfort" from a semi-retiree to a retiree, though I haven't done micro-controllers or soft dsp but I expect no difference between FPGA and software when it comes to fixed point computations...except for terminology.

Software tends to describe fractions "as is" or in Q format. fpga/Asic uses that or integer interpretation. software uses the term "shift" for multiply or divide by 2 but fpga/Asic do/may not as they don't have to shift but rather insert a zero or chop off a bit, unless they want to if data width is meant to stay.

I believe both are valid but here is an example when software terminology does better job. I want a mixer so I generate frequencies from an NCO. The dynamic range is +/-1 but fpga had to pre-scale that to 2^n then divide result of hard mixing by 2^n...Now I want another NCO to generate frequency of Fs/4 only, in the range +/-1 (1,0,-1,0...etc), this tells me don't pre-scale it but just pass sample1,set sample2 to 0, invert sample3 ...if I scale it I am wasting my time and resource.

So in short any fraction given for a design translates up by pre-scaling on FPGAs but any complete integer stays. For a fraction whether we name it as Q format or integer is a matter of taste but eventually that pre-scaled integer must be hardware scaled down to give the fractional weight at result. 

[ - ]
Comment by niarnApril 2, 2019

Hi Rick, I can't provide any references. Maybe because it belongs more under 'system architecture'.

To illustrate what I mean, let's assume an input is 16 bit Q15, meaning one signbit and 15 fracitonal bits. The input could come from and an ADC or e.g. audio that we load from wave file. We can store these values in either a short (16 bit) or an int (32 bit). 

If they are stored in short data-types then most likely some input attenuation is needed to avoid wrap-around. Unless it is known that there is sufficient headroom in the input.

If they are stored in int data-types then it may be possible to shift up (left shift) the input into some higher precision format, for instance it could be Q24, that still leaves a alot of guard-space. In this case the effect of moving a gain from input to output of a linear module may not be so important. It of course comes with some overhead computational-wise.

Not sure if this makes any sense

[ - ]
Comment by Rick LyonsApril 2, 2019
Hi niarn.Thanks

What you wrote does make sense to me. So long as when we shift up the input into a Q24 numerical format that all the multiplication coefficients in the system are also in that Q24 format.

[ - ]
Comment by niarnApril 3, 2019

Hi Rick, the format of the multiplier coefs/gains is not so important :) But the representation of all intermediate and final results of the calculations is important for SNR.

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Registering will allow you to participate to the forums on ALL the related sites and give you access to all pdf downloads.

Sign up

I agree with the terms of use and privacy policy.

Try our occasional but popular newsletter. VERY easy to unsubscribe.
or Sign in