I have a need to raise two to the power of a fixed point number, for example 2**(Q31(-0.4)). Does anyone know an efficient way this I can implement this to a reasonable precision, say 12 bits, without making large lookup tables(i.e. greater than 16 elements long)? It would seem like the 2 would help things. The input range can be constrained to -1 to 0, i.e. the output will always fit in a Q31 as well. Thanks for any advice.

This is the documentation of a Fixed Point package that I have created some years ago.

It describes power of 10 - but the things are correlated.

It is based on the exp() function.

Which could be described in a similar way.

## 1.1Power of ten function (Exp10)

### 1.1.1Algorithm

There is no “algorithm” for the power of ten, since the function is based over a trivial identity:

10^{x} = e^{x ln 10}

And so, other than setting a different Taylor series expansion (which is possible, anyhow), with tests and maintenance required again, by just paying one more MPY (x * ln 10) we get a new function.

ln 10 is – of course – a constant.

And since the function is fast, indeed, there is no problem in paying more time for a multiplication.

The problem might actually be in the range check: the admitted range for Exp10 must necessarily comply with the ranges of both the MPY function and the native Exp function.

### 1.1.2Ranges

Due to Exp dependency, “home” is where input operands are close to 0.

The new required range for Exp10 is -24/20,…, +24/20.

This becomes (after being multiplied by ln (10))

-2.76 < x < 2.76

As input to the Exp ( ).

This is HIGHER than the formal input range of the Exp function.

But not high enough as to convince us having a separate Taylor series for this function.

Since the originally declared input range for this function was not -24/20,…,24/20 but rather

-12/20,…,12/20, I suggest to

- double check the necessary input range for the Exp10,

- verify the number of iterations necessary to get -2.76,…,2.76 range,

and then

- analyze whether to always increase the iterations number for the Exp function, or having an optional parameter Exp(x, iter=8) defaulting to normal iterations count and instantiated to 10/11/…. or whatsoever when called from inside Exp10.

### 1.1.3Devices and Example

No need for examples, I believe, since the function is actually a mere juxtaposition of elementary ones.

### 1.1.4Resources

The cost of an Exp function plus a MPY by a constant.

The constant belongs to <2,…, 3> interval, so attention must be taken to Scenery and normalization issues.

### 1.1.5Extensions

It is not likely, but a rewriting of the Exp10 function in terms of a native Taylor series could be necessary, and the possibility must be taken in mind.

Maybe even differentiating the function kernel into two separate intervals could be an option.

You can always optimize a polynomial implementation over a given range. I would recommend Chebyshev approximation, especially since you have the obvious -1 to 0 range. Don't use Taylor series, they are not optimal except around one point.

You're probably looking at a 3rd / 4th / 5th order polynomial.

See my article: https://www.embeddedrelated.com/showarticle/152.php -- I didn't evaluate 2**x but I did evaluate e**x in the [0,1] range for the Chebyshev coefficients up to degree 5, and they are

c0 | c1 | c2 | c3 | c4 | c5 |

1.7534 | 0.85039 | 0.10521 | 0.0087221 | 0.00054344 | 2.7075e-05 |

so a 3rd order polynomial should get you a max error around c4 = 0.0005 = 11 bits, and a 4th order polynomial should get you a max error around c5 = 15 bits.

In this stackexchange answer are the coefficients for some short finite-order polynomials for evaluating (at a pretty good accuracy) a bunch of transcendental functions including the base 2 exp.

https://dsp.stackexchange.com/questions/20444/books-resources-for-implementing-various-mathematical-functions-in-fixed-point-a/20482#20482

To implement this in fixed-point requires some care with ranges and scaling. Instead of Q31 (or another notation is S1.31), I might suggest 4 bits left of the binary point. (or S4.28). You need to establish the maximum value of the result of the 2^x function and scale the coefficients accordingly. Always scale down intermediate results with the right shift operator (">>" in C), so your coefficients will be integers that are scaled up by some power of two.

Thank you all for the ideas, I'll hopefully be able to get back with the solution I went with.

$$ 2^x = 2^{n+f} = 2^n 2^f = 2^n e^{\ln(2)f} = 2^n L e^{\ln(2)f - ln(L)} $$

$$ y = \ln(2)f - ln(L) \approx 0 $$

$$ 2^x = 2^n L e^{y} = 2^n L \left[ 1 + y + y^2/2 + y^3/6 + y^4/24 ... \right] $$

So you need a lookup table of $L$ with $\ln(L)$ values. Since f is between 0 and 1, then the range of $\ln(2)f$ is about $ 0 \to 0.69$, if you build your table with fourteen entries you should always be able to find an entry within a thirtysecond (five significant bits) of $\ln(2)f$. Therefore, you should only need to take the first four terms of the series to get your desired accuracy. It's only the fourth term and beyond that need a division, the first three terms can be calculated with multiplications, additions, and shifts.

You may want to increase the size of your lookup table so you only need to calculate the first three terms if you want the fastest speed. Of course the last 2^n times can be done by shifting as well.

You can index your lookup table on the first few bits of $\ln(2)f$, so the corresponding lookup value is ln(L) = leading bits + 1/2 the last bit place value, and the L is the exp of that.

For sample code in Gambas to show how it works (but not how to implement fixed point), I've posted an example at the Gambas.One forum.

See:
DSP: Power of Two Approximation

You could use the Hyperbolic Cordic algorithm to compute e^x, then use the identity of 2^x = e^(x*ln(2)).

Brian

it just seems to me, Brian, that especially for a fixed-point implementation, that 2^x is the simplest primitive and then use A^x = 2^(x*log2(A)).

When it's base 2, you can separate the integer and fractional parts of x, compute 2^fract_part(x) and then shift that result to the left by int_part(x) bits (or right if it's negative).

Not knowing the target platform and design constraints, a polynomial and shift operation like you suggest might very well be the best solution, that is a method I have used myself, I was merely offering an alternative. The Cordic algorithm is relatively low complexity using only shifts and adds, and e^x is one of the functions that can be computed directly, so it might make sense for their application. Since they appear to be resource constrained (wanting to keep the LUT <= 16 elements), I offered one of the solutions I have used in the past (implemented in an FPGA) that had not been mentioned already, something they could add to the trade space if it made sense.

Brian

but can't you develop a Cordic operation based on 2^x instead of e^x just as efficiently? there might be an internal scaling operation in there that would be different for 2^x.

the issue, to me, is that if you break x into integer and fractional parts: x = i + f, then 2^f can be implemented to a good precision for 0 <= f < 1 .

Yes, I think you can scale the cordic operations to have the ln(2) scaling built in getting the 2^x directly (I needed e^x, so didn't work on that optimization).

Breaking the operation into x = i + f makes, this is just an alternate way to compute the 2^f part that uses no multiplies, just iterates using shifts, add/sub, and a small look up table. You can control the precision by increasing iterations and LUT size, so it is pretty easy to get the 12 or more bits precision desired.

Hi rbj,

Did you see what I did above?

Ced

i do now. but it looks a little Maclaurenish to me. since, ultimately, the power series will have to be finite (which is a polynomial), you will want (slightly) different coefficients on each power of "y" in the finite-order polynomial than you would get from the Maclauren series having an infinite number of terms. If what you're trying to get is to minimize the maximum error (perhaps the error is weighted in some meaningful manner), then you want to arrive at those coefficients from some numerical optimization such as Remes Exchange Alg.

Using the Lookup table is, in effect, converting the Maclaurin series at a value to a Maclaurin series at near zero, so it converges faster.

Edit: It will only cost two extra multiplications as the others can be combined with the existing coefficients ahead of time.

look for one more level higher than the 2**x. ie, typically the x value must be returned by another fixed-point operation, most of the cases involving log - for dimensional reasons.

let's say x = f(t). then the new expression is y = 2**f(t). try to find out the range of t. once the range is within the limit, apply curve fitting. if the accuracy requirement is higher split the range into different bands and apply fit. most practical cases, 2nd order poly fit should suffice.

alternately, once can also go in the other direction where y is used in another expression, say z = g(y). then the overall expression is z = g(2**x). try the curve fit range.

i remember there was a case where the just keeping different LUT against 't' solve the problem ! original floating point code was referring x from table against different indices.

in general looking around the primitives will help. ie, rather than converting all primitives directly to fixed-point try to see how they are used together. this would need higher level of analysis and sometime more number of LUTs and functions. but once can get higher accuracy for the range she/he is looking for.

Results time, I went with a polynomial approximation. For those interested

These are the python I used to generate the coefficients and the C code I tested them on.

The results: (abs_max_diff = the max difference between the fixed point integer implementation and a double precision reference, MSE = mean squared error across all possible input values of interest to me [-1.0, 0.0]

__2nd order__

max_abs_diff: -53.65372dB -8.91 bits

MSE: -62.52919dB -10.39 bits

__3rd order__

max_abs_diff: -79.87001dB -13.27 bits

MSE: -89.74021dB -14.91 bits

__4th order__

max_abs_diff: -108.23040dB -17.98 bits

MSE: -118.90955dB -19.75 bits

__5th order__

max_abs_diff: -138.25612dB -22.96 bits

MSE: -149.67265dB -24.86 bits

__6th order__

max_abs_diff: -168.05046dB -27.91 bits

MSE: -179.93220dB -29.89 bits

__7th order__

max_abs_diff: -174.21027dB -28.94 bits

MSE: -188.13559dB -31.25 bits

After 7th the arithmetic precision starts to decrease the performance. I implemented this on a 32 bit machine with a 64 bit accumulator(hence the included C implementation)

For my purposes, 3rd order would suffice but the assembly implementation was so fast I will probably go higher. Thanks again all.

Just kidding. If it is fast enough, generally simpler is better. Thanks for the followup letting us know the outcome.