# Sine Lookup Table with Linear Interpolation

Started by March 16, 2013
```On 3/17/2013 7:31 PM, Phil Hobbs wrote:
> On 3/17/2013 6:44 PM, rickman wrote:
>> On 3/17/2013 6:20 PM, John Larkin wrote:
>>>
>>> It would be interesting to distort the table entries to minimize THD.
>>> The
>>> problem there is how to measure very low THD to close the loop.
>>
>> It could also be a difficult problem to solve without some sort of
>> exhaustive search. Each point you fudge affects two curve segments and
>> each curve segment is affected by two points. So there is some degree
>> of interaction.
>>
>> Anyone know if there is a method to solve such a problem easily? I am
>> sure that my current approach gets each segment end point to within &#2013266097;1
>> lsb and I suppose once you measure THD in some way it would not be an
>> excessive amount of work to tune the 256 end points in a few passes
>> through. This sounds like a tree search with pruning.
>>
>> I'm assuming there would be no practical closed form solution. But
>> couldn't the THD be calculated in a simulation rather than having to be
>> measured on the bench?
>>
>
> The simple method would be to compute a zillion-point FFT. Since the
> function is really truly periodic, the FFT gives you the right answer
> for the continuous-time Fourier transform, provided you have enough
> points that you can see all the relevant harmonics.

Ok, as a mathematician would say, "There exists a solution!"

--

Rick
```
```On 3/18/2013 3:01 AM, Robert Baer wrote:
> rickman wrote:
>> On 3/16/2013 8:55 PM, Robert Baer wrote:
>>> rickman wrote:
>>>> I've been studying an approach to implementing a lookup table (LUT) to
>>>> implement a sine function. The two msbs of the phase define the
>>>> sufficient with an 18 bit output. Another 11 bits of phase will give me
>>>> sufficient resolution to interpolate the sin() to 18 bits.
>>>>
>>>> If you assume a straight line between the two endpoints the midpoint of
>>>> each interpolated segment will have an error of
>>>> ((Sin(high)-sin(low))/2)-sin(mid)
>>>>
>>>> Without considering rounding, this reaches a maximum at the last
>>>> segment
>>>> before the 90 degree point. I calculate about 4-5 ppm which is about
>>>> the
>>>> same as the quantization of an 18 bit number.
>>>>
>>>> There are two issues I have not come across regarding these LUTs.
>>>> One is
>>>> adding a bias to the index before converting to the sin() value for the
>>>> table. Some would say the index 0 represents the phase 0 and the index
>>>> 2^n represents 90 degrees. But this is 2^n+1 points which makes a LUT
>>>> inefficient, especially in hardware. If a bias of half the lsb is added
>>>> to the index before converting to a sin() value the value 0 to 2^n-1
>>>> becomes symmetrical with 2^n to 2^(n+1)-1 fitting a binary sized table
>>>> properly. I assume this is commonly done and I just can't find a
>>>> mention
>>>> of it.
>>>>
>>>> The other issue is how to calculate the values for the table to give
>>>> the
>>>> best advantage to the linear interpolation. Rather than using the exact
>>>> match to the end points stored in the table, an adjustment could be
>>>> done
>>>> to minimize the deviation over each interpolated segment. Without this,
>>>> the errors are always in the same direction. With an adjustment the
>>>> errors become bipolar and so will reduce the magnitude by half
>>>> (approx).
>>>> Is this commonly done? It will require a bit of computation to get
>>>> these
>>>> values, but even a rough approximation should improve the max error
>>>> by a
>>>> factor of two to around 2-3 ppm.
>>>>
>>>> Now if I can squeeze another 16 dB of SINAD out of my CODEC to take
>>>> advantage of this resolution! lol
>>>>
>>>> One thing I learned while doing this is that near 0 degrees the sin()
>>>> function is linear (we all knew that, right?) but near 90 degrees, the
>>>> sin() function is essentially quadratic. Who would have thunk it?
>>>>
>>> Sounds like you are making excellent improvements on the standard ho-hum
>>> algorithms; the net result will be superior to anything done out there
>>> (commercially).
>>> With the proper offsets, one needs only 22.5 degrees of lookup ("bounce"
>>> off each multiple of 45 degrees).
>>
>> I can't say I follow this. How do I get a 22.5 degree sine table to
>> expand to 90 degrees?
>>
>> As to my improvements, I can see where most implementations don't bother
>> pushing the limits of the hardware, but I can't imagine no one has done
>> this before. I'm sure there are all sorts of apps, especially in older
>> hardware where resources were constrained, where they pushed for every
>> drop of precision they could get. What about space apps? My
>> understanding is they *always* push their designs to be the best they
>> can be.
>>
> Once upon a time, about 30 years ago, i fiddled with routines to
> multiply a millions-of-digits number by another millions-of-digits number.
> Naturally, i used the FFT and convolution (and bit-reordering) to do this.
> As the number of digits grew, i needed to use a larger N (in 2^N
> samples), so different algorithms were needed in different ranges to
> maximize speed.
> So i had to try a goodly number of published, standard, accepted, used
> routines.
> ALL algorithms for N larger than 5 worked but i was able to speed them
> up anywhere from 10% to 30% by using "tricks" which were the equivalent
> of "folding" around 22.5 degrees.
> "Fold" in half: sin(-x)=-sin(x) or 180 degrees.
> Half again on the imaginary axis for what you are doing for 90 degrees:
> sin(90-x) = sin(90)*cos(x)-cos(90)*sin(x) = -cos(x) exact.

Almost exact, sin(90-x) = cos(x), not -cos(x).  More important is
sin(180-x) = sin(x) from the same equation above.  This is the folding
around 90 degrees.  Or sin(90+x) = cos(x), but I don't have a cos
table... lol

This is the two levels of folding that are "free" or at least very
inexpensive.  It requires controlled inversion on the bits of x and a
controlled 2's complement on the LUT output.

> Half again: sin(45)=cos(45)=2^(3/2)~~.707 carry out as far as you want;
> sin/cos to 45 degrees. sin(x=0..45) direct, sin(x=45..90) =cos(45-x)
> accuracy depending on value of sin/cos(45).

You really lost me on this one.  How do I get cos(45-x)?  If using this
requires a multiply by sqrt(2), that isn't a good thing.  I don't have a
problem with a 512 element table.  The part I am using can provide up to
36 bit words, so I should be good to go with the full 90 degree table.

Separating x into two parts, 45 and x', where x' ranges from 0 to 45,
the notation should be, sin(45+x') = sin(45)cos(x') + cos(45)sin(x') =
sin(45) * (cos(x') + sin(x')).  I'm not sure that helps on two levels.
I don't have cos(x') and I don't want to do multiplies unless I have to.

> That is prolly as far as you want to go with a FPGA, because (as you
> indicated) you do not want to attempt multiplies in the FPGA.
> That last step of "folding" might give better results or better FPGA /
> LUT calcs.
>
> Calculating actual values for the LUT for maximum accuracy: the trig
> series x-(x^3/3!)+(x^5/5!) etc should give better results from 0..90
> than the approx on page 51.
> Do not hesitate to try alternate schemes: sin(2x) = 2*sin(x)*cos(x) and
> sin(0.5*x)=+/- sqrt((1-cos(x))/2) where that sign depends on the

I don't have resources to do a lot of math.  A linear interpolation will
work ok primarily because it only uses one multiply of limited length.
I can share this entire circuit between the two channels because the
CODEC interface is time multiplexed anyway.

I did some more work on the interpolation last night and I put to rest
my last nagging concerns.  I am confident I can get max errors of about
&#2013266097;2 lsbs using a 512 element LUT and 10 bit interpolation.  This is not
just the rounding errors in the calculations but also the error due to
the input resolution which is 21 bits.  The errors without the input
resolution are limited to about 1.5 lsbs.  I expect the RMS error is
very small indeed as most point errors will be much smaller than the max.

--

Rick
```
```On Mar 18, 7:10&#2013266080;pm, Jerry Avins <j...@ieee.org> wrote:
> On 3/17/2013 6:35 PM, rickman wrote:
>
>
>
>
>
>
>
>
>
> > On 3/17/2013 6:23 PM, langw...@fonz.dk wrote:
> >> On Mar 17, 10:56 pm, rickman<gnu...@gmail.com> &#2013266080;wrote:
> >>> On 3/16/2013 8:02 PM, langw...@fonz.dk wrote:
>
> >>>> why &#2013266080;not skip the lut and just do a full sine approximation
>
> >>>> something like this, page 53-54
>
>
> >>> I'm not sure this would be easier. &#2013266080;The LUT an interpolation require a
> >>> reasonable amount of logic. &#2013266080;This requires raising X to the 5th power
> >>> and five constant multiplies. &#2013266080;My FPGA doesn't have multipliers and this
> >>> may be too much for the poor little chip. &#2013266080;I suppose I could do some of
> >>> this with a LUT and linear interpolation... lol
>
> >> I see, I assumed you had an MCU (or fpga) with multipliers
>
> >> considered CORDIC?
>
> > I should learn that. &#2013266080;I know it has been used to good effect in FPGAs.
> > Right now I am happy with the LUT, but I would &#2013266080;like to learn more about
> > CORDIC. &#2013266080;Any useful referrences?
>
> http://www.andraka.com/files/crdcsrvy.pdfshould be a good start. Do you
> remember Ray Andraka?
>

is he still around? I just checked his page, last update was 2008

a quick google finds something that might give a quick view of how it
works
https://github.com/the0b/MSP430-CORDIC-sine-cosine/blob/master/cordic.c

-Lasse

```
```On 3/18/2013 1:44 AM, Robert Baer wrote:
> glen herrmannsfeldt wrote:
>> In comp.dsp rickman<gnuarm@gmail.com> wrote:
>>> I've been studying an approach to implementing a lookup table (LUT) to
>>> implement a sine function. The two msbs of the phase define the
>>> sufficient with an 18 bit output. Another 11 bits of phase will give me
>>> sufficient resolution to interpolate the sin() to 18 bits.
>>
>>> If you assume a straight line between the two endpoints the midpoint of
>>> each interpolated segment will have an error of
>>> ((Sin(high)-sin(low))/2)-sin(mid)
>>
>> See:
>>
>> http://ia601506.us.archive.org/8/items/bitsavers_nationaldaMOSIntegratedCircuits_20716690/1972_National_MOS_Integrated_Circuits.pdf
>>
>>
>> The description starts on page 273.
>>
>> -- glen
> Problem with that..could see only front and back cover.
> Different source perhaps?

It is a lot of document for just two pages.  Try this link to see the
short version...

arius.com/foobar/1972_National_MOS_Integrated_Circuits_Pages_273-274.pdf

--

Rick
```
```On 3/17/2013 8:51 PM, glen herrmannsfeldt wrote:
> In comp.dsp rickman<gnuarm@gmail.com>  wrote:
> (snip)
>>>> I've been studying an approach to implementing a lookup table (LUT) to
>>>> implement a sine function.
>
>>> (snip)
>
>>>> If you assume a straight line between the two endpoints the midpoint of
>>>> each interpolated segment will have an error of
>>>> ((Sin(high)-sin(low))/2)-sin(mid)
>
>>> Seems what they instead do is implement
>
>>> sin(M)cos(L)+cos(M)sin(L) where M and L are the more and less
>>> significant bits of theta. Also, cos(L) tends to be almost 1,
>>> so they just say it is 1.
>
>> Interesting.  This tickles a few grey cells.  Two values based solely on
>> the MS portion of x and a value based solely on the LS portion.  Two
>> tables, three lookups a multiply and an add.  That could work.  The
>> value for sin(M) would need to be full precision, but I assume the
>> values of sin(L) could have less range because sin(L) will always be a
>> small value...
>
> The cos(M)sin(L) is one table, with some bits from each.
>
> Well, when you do the interpolation, you first need to know the
> spacing between points in the sine table. That spacing is
> proportional to cos(x). So, the interpolation table is
> indexed by the high four bits of M and the four bits of L.

Yes, but the error is much larger than the errors I am working with.  I
want an 18 bit output from the process... at least a 16 bit accurate
since the DAC is only ~90 dB SINAD.  I expect to get something that is
about 17-17.5 ENOB going into the DAC.

I don't think the method described in the paper is so good when you want
accuracies this good.  The second LUT gets rather large.  But then it
doesn't use a multiplier does it?

>>> Well, you can fix it in various ways, but you definitely want 2^n.
>
> (snip)
>> I think one of the other posters was saying to add an entry for 90
>> degrees.  I don't like that.  I could be done but it complicates the
>> process of using a table for 0-90 degrees.
>
> I am not sure yet how they do that. I did notice that when adding the
> interpolation value they propagate the carry, so the output can go to
> the full 1.0000 instead of 0.99999. But it will do that before 90
> degrees.

They just meant to use a 2^n+1 long table rather than just 2^n length.
No real problem if you are working in software with lots of memory I
suppose.

>>> Seems like that is one of the suggestions, but not done in the ROMs
>>> they were selling. Then the interpolation has to add or subtract,
>>> which is slightly (in TTL) harder.
>
>>> The interpolated sine was done in 1970 with 128x8 ROMs. With larger
>>> ROMs, like usual today, you shouldn't need it unless you want really
>>> high resolution.
>
>> I'm not using an actual ROM chip.  This is block ram in a rather small
>> FPGA with only six blocks.  I need two channels and may want to use some
>> of the blocks for other functions.
>
> So, the one referenced uses three 256x4 ROMs to get 12 bits of sin(M)
> from 8 bits of M, and then a fourth 128x8 ROM to get five bits to
> add to the low bits of the 12 bit sin(M).

Yep.  About a factor of 35db worse than my design I expect.

One thing I realized about the descriptions most of these designs give
is they only analyze the noise from the table inaccuracies *at the point
of the sample*.  In other words, they don't consider the noise from the
input resolution.  Even if you get perfect reconstruction of the sine
values, such as a pure LUT, you still have the noise of the resolution
of the input to the sine generation.  Just my 2 cents worth...

--

Rick
```
```On Mon, 18 Mar 2013 14:38:53 -0700 (PDT), "langwadt@fonz.dk"

>On Mar 18, 7:10&#2013266080;pm, Jerry Avins <j...@ieee.org> wrote:
>> On 3/17/2013 6:35 PM, rickman wrote:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> > On 3/17/2013 6:23 PM, langw...@fonz.dk wrote:
>> >> On Mar 17, 10:56 pm, rickman<gnu...@gmail.com> &#2013266080;wrote:
>> >>> On 3/16/2013 8:02 PM, langw...@fonz.dk wrote:
>>
>> >>>> why &#2013266080;not skip the lut and just do a full sine approximation
>>
>> >>>> something like this, page 53-54
>>
>>
>> >>> I'm not sure this would be easier. &#2013266080;The LUT an interpolation require a
>> >>> reasonable amount of logic. &#2013266080;This requires raising X to the 5th power
>> >>> and five constant multiplies. &#2013266080;My FPGA doesn't have multipliers and this
>> >>> may be too much for the poor little chip. &#2013266080;I suppose I could do some of
>> >>> this with a LUT and linear interpolation... lol
>>
>> >> I see, I assumed you had an MCU (or fpga) with multipliers
>>
>> >> considered CORDIC?
>>
>> > I should learn that. &#2013266080;I know it has been used to good effect in FPGAs.
>> > Right now I am happy with the LUT, but I would &#2013266080;like to learn more about
>> > CORDIC. &#2013266080;Any useful referrences?
>>
>> http://www.andraka.com/files/crdcsrvy.pdfshould be a good start. Do you
>> remember Ray Andraka?
>>
>
>is he still around? I just checked his page, last update was 2008

His LinkedIn profile was updated in October of last year.
```
```On 3/18/2013 12:21 PM, Rob Gaddi wrote:
> On Sat, 16 Mar 2013 18:30:51 -0400
> rickman<gnuarm@gmail.com>  wrote:
>
>> I've been studying an approach to implementing a lookup table (LUT) to
>> implement a sine function.  The two msbs of the phase define the
>> sufficient with an 18 bit output.  Another 11 bits of phase will give me
>> sufficient resolution to interpolate the sin() to 18 bits.
>>
>> If you assume a straight line between the two endpoints the midpoint of
>> each interpolated segment will have an error of
>> ((Sin(high)-sin(low))/2)-sin(mid)
>>
>> Without considering rounding, this reaches a maximum at the last segment
>> before the 90 degree point.  I calculate about 4-5 ppm which is about
>> the same as the quantization of an 18 bit number.
>>
>> There are two issues I have not come across regarding these LUTs.  One
>> is adding a bias to the index before converting to the sin() value for
>> the table.  Some would say the index 0 represents the phase 0 and the
>> index 2^n represents 90 degrees.  But this is 2^n+1 points which makes a
>> LUT inefficient, especially in hardware.  If a bias of half the lsb is
>> added to the index before converting to a sin() value the value 0 to
>> 2^n-1 becomes symmetrical with 2^n to 2^(n+1)-1 fitting a binary sized
>> table properly.  I assume this is commonly done and I just can't find a
>> mention of it.
>>
>> The other issue is how to calculate the values for the table to give the
>> best advantage to the linear interpolation.  Rather than using the exact
>> match to the end points stored in the table, an adjustment could be done
>> to minimize the deviation over each interpolated segment.  Without this,
>> the errors are always in the same direction.  With an adjustment the
>> errors become bipolar and so will reduce the magnitude by half (approx).
>>    Is this commonly done?  It will require a bit of computation to get
>> these values, but even a rough approximation should improve the max
>> error by a factor of two to around 2-3 ppm.
>>
>> Now if I can squeeze another 16 dB of SINAD out of my CODEC to take
>> advantage of this resolution!  lol
>>
>> One thing I learned while doing this is that near 0 degrees the sin()
>> function is linear (we all knew that, right?) but near 90 degrees, the
>> sin() function is essentially quadratic.  Who would have thunk it?
>>
>> --
>>
>> Rick
>
> Just one note, since you're doing this in an FPGA.  If your look up
> table is a dual-port block RAM, then you can look up the the slope
> simultaneously rather than calculate it.  sin(x)/dx = cos(x) = sin(x +
> 90 deg).

That is an interesting point, but this is actually harder to calculate
with than the difference.  First, cos(x) is sin(90-x).  Second, where we
are looking for the sin(M+L) (M is msbs and L is lsbs) the linear
interpolation becomes,

delta = sin(M+1)-sin(M) * L

This is a simple multiply.

To do this using the cos function it becomes,

delta = sin(1/Mmax) * cos(M) * L

sin(1/Mmax) is a constant to set the proper scaling factor.  With two
variables and a constant, this requires two multiplies rather than one
multiply and one subtraction.  As it turns out my LUT has lots of bits
out so I may just store the difference eliminating one step.  It depends
on how I end up designing the actual circuit.  The ram can either be a
true dual port with 18 bits or a single port with 36 bits.  So if I need
the dual port, then I might not want to store the delta values.  Either
way, this is no biggie.

--

Rick
```
```On 3/18/2013 1:42 PM, Kevin Neilson wrote:
> Rickman,
> 1.  I did a quarter-wave table once, and I put in the phase offset exactly like you said.  That is, the 0th index wasn't sin(0), but rather something like sin(2*pi/(2*N)) where N is the number of elements in the table.  I didn't find that in any references either.  That also helps because since you don't have to store the value -1 in the ROM so you don't have to scale the whole thing to avoid having the extra bit you'd need to store the value -1.

Wow, Thunderbird shows each of your paragraphs as one line going off the
right side of the screen and into the next room!  I have to read the
thread display to see what you wrote... lol

Actually, the 1/2 delta offset is a bit different if you are
interpolating.  It has to be 1/2 of the lsb of the input word, not 1/2
lsb of the table address.  But I think this helps with a couple of
problems as you have noted.

> 2.  I'm doing another table now, but I'm just doing a full-wave.  If you are using a Xilinx, you can do a 1024*18 lookup in a single BRAM.  (But it sounds like you may be using an Actel or something if you don't have any multipliers.)

I am using a Lattice part which is very similar to the Xilinx parts.  It
has 512x36 block rams that can only be 18 bits wide if used as dual
ports.  They are very much like the Xilinx BRAM.

> 3.  To answer your question about the old days, I think they used CORDICs, which get you great accuracy with the fewest gates.  But it does take you about a clock cycle per bit of resolution, which is time you probably don't have.  They didn't have blockRAMs either.

Yes, I have never dug into the CORDIC algorithm in detail.  I should do
that.  Actually this design has lots of time.  The CODEC is delta-sigma
and will be used at a 128x clock rate.  The LUT is easy to implement and
I have done all the design work now.  But I should use this opportunity
to learn the CORDIC.

> 4.  Like Rob said, you already have the derivative of sine stored in the table, so you don't actually have to do a subtraction to find the slope.  If you want more accuracy, you can do a 2nd-order Taylor, because you also have the 2nd derivative already in the table.  However, this takes another multiplier.  I think the Xilinx CoreGen core does this.

The cos is the derivative of the sin, but a scale factor is needed.
That adds a multiply.  Subtracting is easy or I may store it if I don't
use the dual port of the RAM.

> 5.  Make sure you're doing the kind of rounding you want.  Sometimes something like Matlab does a round toward +inf when really you want a round away from zero.

Yeah, that really isn't rounding, is it?  I am using Open Office Calc
which has separate functions for floor, ceiling and round.  It is giving
me what I want *and* what I need.  lol

> 6.  I'm using VHDL now and am doing the whole thing in HDL.  That is, I'm using the math and sfixed library to calculate the ROM values so I don't need Matlab or any preprocessing, which makes it all cleaner.  I don't know if the Verilog math functions are supported by the tools yet, but you could still do the same thing with your own Maclaurin function.

Once I am done evaluating the algorithm, which I think I am, I will come
up with a formula to encode this into a function.  Then VHDL ROMs can be
initialized with the function call.  I just want to do it in a way that
it doesn't take a hit on the performance, but that should not be hard if
I make it part of the reset clause of a process defining the ROM/RAM.

Thanks for the advice.  If you would like to share any code I'd be happy
to send you a copy.

--

Rick
```
```On 3/18/2013 7:01 PM, Jamie wrote:
> rickman wrote:
>>
>> It is a lot of document for just two pages. Try this link to see the
>> short version...
>>
>> arius.com/foobar/1972_National_MOS_Integrated_Circuits_Pages_273-274.pdf
>>
> isn't it "fubar" ?
>
> Jamie

This is the French spelling.  Make sure you put the accent on the second
syLAble.

--

Rick
```
```rickman <gnuarm@gmail.com> wrote:

(snip)
> Yes, I have never dug into the CORDIC algorithm in detail.  I should do
> that.  Actually this design has lots of time.  The CODEC is delta-sigma
> and will be used at a 128x clock rate.  The LUT is easy to implement and
> I have done all the design work now.  But I should use this opportunity
> to learn the CORDIC.

In that case, you want CORDIC. The LUT is when you need one every cycle,
more or less.

If I remember it right, I found a more modern reference to sine LUT that
generates one in 300ps. Definitely for those in a hurry to see results.

-- glen
```