# Introduction

Most signal processing intensive applications on FPGA are still implemented relying on integer or fixed-point arithmetic. It is not easy to find the key ideas on quantization, fixed-point and integer arithmetic. In a series of articles, I aim to clarify some concepts and add examples on how things are done in real life. The ideas covered are the result of my professional experience and hands-on projects.

In this article I will present the most fundamental question you need to ask to starting building experience in this domain: what is fixed-point? The article won't go through complicated mathematical equations. The intention is to build intuition by simple concepts we already know. I hope it's useful for some DSP lovers out there!!

# Fixed-Point

What is fixed-point after all? Our everyday life has nothing to do with fixed-point. When we go to the supermarket we ask for 1.2 pounds of bread, or we pay 2.4 dollars for something. We never say "here you have your 134532523 dollars, but remember it's Q3.15". Life is real, and so are the numbers that represent it in our mind.

Real numbers are a mathematical entity and are full of meaning for us. Integer numbers are also a mathematical entity, and we can understand them well. What are fixed-point numbers after all? Well, to put it simple, fixed-point is a path, a way to go back and forth between real numbers and integer ones. This path is necessary for a reason: designing filters, solving equations and pretty much everything we do will happen in the world of real numbers. FPGA or ASIC implementations, even with very powerful devices, will most likely use integer operations.

Let's dive into the details of fixed-point. Let's start with a simple example. Let's say we have an ADC converter, which gives 16-bit, signed numbers at its output. These numbers go all the way from -32768 to 32767, which is nothing but $-2^{15}$ to $2^{15}-1.$ Now we want to implement a digital circuit in our FPGA, which is going to be a gain. This gain can be either positive or negative. Let's describe in words what we want, and then with some luck we will end up understanding how to manipulate the numbers to make it work.

I want my gain block to be configurable from $-1$ to $+1.$ Everybody understands this. No more words needed. A gain of $-1$ will make the signal flip (think of a scope capture), and a gain of $+1$ will give the same result. Additionally, the client says "I want to have at least 50 different settings for my gain". Things become a bit more complicated. I need to separate my entire range in at least 50 parts or sections. As engineers we are close relatives to powers of two, so we say "you know what, for the same price you will get 64 steps". Off course that is because we know 64 is $2^6.$ What we are saying here is that the gain will be a 6-bit number.

So far, so good. We have our 16-bit number coming in, and the 6-bit gain that will be applied. The mapping seems obvious for the gain, but we can still write it down to make it clear:

• $-1$: will be mapped to -32, the most negative number we can represent.
• $+1$: will be mapped to 31, the most positive number we can represent.
• $2/64$=$0.03125$, which is the range divided by the number of steps.

Few details here. We cannot represent $+1.$ Instead, what we have is $0.03125\times 31$, which is $0.96875.$ The step speaks for itself, and it represents the minimum amount we can differentiate between two consecutive codes.

# Completing the example

We have already built the tools to understand most of the concepts, believe it or not. So let's put some numbers on it and see if it works. I told you that fixed-point is just a path to move back and forth between real and integer numbers. Let's see this powerful idea in action.

Let's say the ADC voltage range is $\pm 1$ V. Now suppose the actual input is $0.234$ V and the desired gain is $0.3.$ We can do an easy calculation to obtain the output voltage after the gain, which is $0.234\times 0.3=0.0702.$ This is what we should get, so let's try to get the same result by applying our gain block.

We calculated the step to be $0.03125$ for the gain, which means that we can now compute how many steps we need to approach $0.3,$ the desired gain. This is $0.3/0.03125=9.6.$ Unfortunately we won't be able to use $9.6$ steps: we either use 9 or 10. Let's say we chose to use 10 steps, this will give a quantized gain of $0.3125.$ As you can see some errors show up, and that's the way it is.

Here we need a bit of theory. Let's build the path from real numbers to integer numbers by using fixed-point for the ADC. We said the voltage range is $\pm 1$ V, and the digital range is -32768 to 32767. The mapping to go from 1 to 32768 is multiplying it by $2^{15},$ which is nothing but the fixed-point representation of the integer number. Let's put this in words. We have an integer number which uses 16-bits, and then we want to divide this by some power of 2 to end up with a real number, which we know how to handle. The magic number is $2^{15}$ in this case. The theory will call this "1 bit for the integer part, and 15 bits for the fractional part". Some will call this Q1.15 or in any other fancier way, now you understand what is going on so we don't need names.

The same idea can be applied for the gain, where we used 5 bits for the fraction and 1 bit for the integer part, so the mapping is $\pm 1$ to -32 to 31 (remember the exact 1 is left outside). Back into the math, we fixed the ADC input to $0.234,$ which will be translated to 7668 (please do the math calculating the step similar to the gain example). The calculation is as follows:

$7668\times 10=76680,$

where 7668 is the voltage in ADC steps and 10 is the gain, in gain steps. What can we do with this number? Google around 5 minutes and you will find the rules for fixed-point multiplication: the number of bits are added, everywhere. We originally had 1 bit for the integer part for both the ADC and gain, and we have 15 bits of fractional part for the ADC and 5 for the gain. So the result must be 2 bits for the integer part and 20 bits for the fractional. Let's try it. The path from integer to real given by fixed-point says we need to divide our result by $2^{20}.$ This should lead to the real world back again. I'll let you do the math, but you can easily verify that the result we get is $0.07312775.$ This is not quite right, because when we did our real life calculation, we got $0.0702,$ but we know the gain was carrying some error, as well as the ADC data.

As a last word, I will give you a little bit of homework. What if the ADC range is $\pm 5$ V but it's still 16-bit? Do you think the gain block will change at all? Well the answer is no, and I'm pretty sure you already know why.

# Conclussion

The intention of this article is to walk through numbers and define fixed-point as a way of going from real to integer numbers. World is real, and such are the numbers that represent it. Computation and signal processing blocks are implemented using integer arithmetic, specially on FPGA and ASIC implementations. Understanding how to go from real numbers to integer is crucial when implementing solutions using these platforms.

Stay tuned!! This is the first article of a series where I will cover some other fun ideas and concepts to make complex things work in real life.

[ - ]
Comment by January 27, 2023 There is some confusion in this article between real numbers and floating-point representation. They are very different things. In this context, real numbers are analog and floating-point numbers are digital. Therefore, the way you talk about "real" in the Conclusion is correct, but everywhere else it is not. Floating-point numbers have finite precision and aren't so different from fixed-point numbers (the main difference being that their exponent is explicit and can therefore "float" around, instead of staying "fixed" to an implicit value). There are plenty of other important differences (particularly if we're talking about IEEE754), but that's the key one here.

If you want your gain to go from -1 to +1 with at least 50 steps, then you need 7 bits (signed two's complement): -1.0 = "11.00000" and +1.0 = "01.00000". You may think that 0.96875 seems quite close to 1.0, but your specification was 1.0. So either the specification was wrong, or the implementation is.

Your 5 minutes of Googling regarding fixed-point multiplication has led you to the wrong answer. This is not always true: "the number of bits are added, everywhere". This can be confirmed analytically, but it's not straightforward. Therefore, I would encourage you to exhaustively multiply pairs of fixed-point numbers across all possible values for each number, for a range of different fixed-point representations. (Use small bit-widths to keep computation time manageable). Then check the necessary and sufficient number of bits to represent the result of the multiplication in each case. You may be amazed to see how many tricky edge cases there are that break the "rule" you found by Googling.

One very trivial example is if you multiply two 1-bit signed integers together. Do you need 1+1=2 bits to represent the result? No. The result can only be 0 or 1, which only needs 1 bit (unsigned) to represent.

[ - ]
Comment by January 27, 2023 Hi there,

Thanks for your comments!! Honestly, I meant to write an example, very introductory, for those that don't have much experience in the field. I don't know why you introduced floating-point, as I never mentioned that in the article.

Regarding your multiplication fact, I can tell you that if you always keep that extra MSB in real life applications, you will keep loosing a ton of precision in the long term. Multiplying -1 x -1 sure needs one extra bit for representing +2, but good practice won't keep this extra bit and will invest it better for added precision.

The example of the +1/-1 yes, it is true you cannot reach +1. Now, this example is coming from a real life thing, so the ADC output does not reach the most positive voltage either, which is my actual input. So when you say +1 V and -1 V is the range, that is not totally true, I know. It is common practice to say "I have x bits and my range is -1/+1", when you actually mean -1 to +0.999 or whatever precision you have. You will never pay that extra bit for not using it 99 % of the time.

Again, the article is not intended to be exhaustive, just an example for people without experience that may get confused easily. And also to stress the fact that you can implement very complex signal processing things without having to use expensive floating-point operators!!

I hope I can write more with hands-on examples on FPGA!

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.