On 3/18/13 12:56 AM, dbd wrote:> For an approach with easier error estimation you could use the three table sine/cosine generator that implements the trig identities: > > Sum formulas for sine and cosine > sin(w + deltaw) = sin(w) * cos(deltaw) + cos(w) * sin(deltaw) > cos(w + deltaw) = cos(w) * cos(deltaw) � sin(w) * sin(deltaw) > > The tables required are: > #1 one quadrant sin(w)/cos(w) table of 2^M entriesi don't understand what this table is for at all? tan(w)? what do you do with it?> #2 sin(deltaw) table of 2^N entries > #3 cos(deltaw) table of 2^N entries > (#3 only if cosine is generated as well as sine)you need both tables, even if you only want to generate the sine, using this method. to use the above method, which is equivalent to e^(j*(w + deltaw)) = e^(j*w) * e(j*deltaw) , you need both sine and cosine signals (the same as the imaginary and real parts of the signal above). likely you want to initialize cos(w)=1 and sin(w)=0 for an initial w of 0. also, to be clear: deltaw = 2*pi*f0/Fs as far as the cos(deltaw) table, *that* can be taken from the sin(deltaw) table (going backwards) if the entries are equally spaced in both cases. if there is a smaller table with just a selection of pertinent deltaw, then you might need a pair of tables for the reduced number of entries.> This will calculate the equivalent of a 2+M+N entry sine/cosine table. The only error sources in the calculated results are the errors in generating/representing the values in the tables and the accuracy of implementing the adds and multiplies in the trig identity.i have done this method of quadrature sinusoid generation (usually for a frequency shifter application) on a few different occasions using both fixed and floating-point processors. essentially, you are doing a complex multiplication (by e^(j*deltaw) for each samples. you will be representing the real and imaginary parts of that complex coefficient with finite-precision words. especially in the fixed-point case, you must worry about the magnitude of this complex coefficient, e(j*deltaw). it should be 1, but if, due to roundoff, it is less than 1, your quadrature sinusoid will spiral down to have virtually 0 amplitude. if it is greater than 1, your quadrature sinusoid will grow in amplitude until it hits some limits. if you're careful and the cos(deltaw) and sin(deltaw) are rounded up in magnitude so that, by the smallest of millismidgens, they square and add to slightly greater than 1, then the amplitude, if not already at the rails, will grow slowly until it hits the rails. but this is not so bad. just like an analog oscillator were the loop gain exceeds 1, it is the clipping at the maximum amplitude that defines the amplitude. to do this correctly, you should AGC the output from amplitude of the existing sinusoid which can be computed from each sample from A^2 = (sin(w))^2 + (cos(w))^2 A^2 should be 1 (obviously), but because of the finite precision, might be a little smaller or larger than 1. in the past, we discussed this here at comp.dsp and a simple adjustment to the gain that would be attached to the cos(deltaw) and sin(deltaw) coefs would be 1 + alpha*(1 - A^2) where 0 < alpha < 1/2 if alpha is small, the gain compensation is slow, that might be good. if alpha is 1/2, the gain adjustment is immediate. this might be the way to do it with a floating-point chip. -- r b-j rbj@audioimagination.com "Imagination is more important than knowledge."

# Sine Lookup Table with Linear Interpolation

Started by ●March 16, 2013

Reply by ●March 18, 20132013-03-18

Reply by ●March 18, 20132013-03-18

On Sat, 16 Mar 2013 18:30:51 -0400 rickman <gnuarm@gmail.com> wrote:> I've been studying an approach to implementing a lookup table (LUT) to > implement a sine function. The two msbs of the phase define the > quadrant. I have decided that an 8 bit address for a single quadrant is > sufficient with an 18 bit output. Another 11 bits of phase will give me > sufficient resolution to interpolate the sin() to 18 bits. > > If you assume a straight line between the two endpoints the midpoint of > each interpolated segment will have an error of > ((Sin(high)-sin(low))/2)-sin(mid) > > Without considering rounding, this reaches a maximum at the last segment > before the 90 degree point. I calculate about 4-5 ppm which is about > the same as the quantization of an 18 bit number. > > There are two issues I have not come across regarding these LUTs. One > is adding a bias to the index before converting to the sin() value for > the table. Some would say the index 0 represents the phase 0 and the > index 2^n represents 90 degrees. But this is 2^n+1 points which makes a > LUT inefficient, especially in hardware. If a bias of half the lsb is > added to the index before converting to a sin() value the value 0 to > 2^n-1 becomes symmetrical with 2^n to 2^(n+1)-1 fitting a binary sized > table properly. I assume this is commonly done and I just can't find a > mention of it. > > The other issue is how to calculate the values for the table to give the > best advantage to the linear interpolation. Rather than using the exact > match to the end points stored in the table, an adjustment could be done > to minimize the deviation over each interpolated segment. Without this, > the errors are always in the same direction. With an adjustment the > errors become bipolar and so will reduce the magnitude by half (approx). > Is this commonly done? It will require a bit of computation to get > these values, but even a rough approximation should improve the max > error by a factor of two to around 2-3 ppm. > > Now if I can squeeze another 16 dB of SINAD out of my CODEC to take > advantage of this resolution! lol > > One thing I learned while doing this is that near 0 degrees the sin() > function is linear (we all knew that, right?) but near 90 degrees, the > sin() function is essentially quadratic. Who would have thunk it? > > -- > > RickJust one note, since you're doing this in an FPGA. If your look up table is a dual-port block RAM, then you can look up the the slope simultaneously rather than calculate it. sin(x)/dx = cos(x) = sin(x + 90 deg). -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix.

Reply by ●March 18, 20132013-03-18

On Monday, March 18, 2013 8:51:48 AM UTC-7, robert bristow-johnson wrote:> On 3/18/13 12:56 AM, dbd wrote: > ... > > For an approach with easier error estimation you could use the three table sine/cosine generator that implements the trig identities: > > Sum formulas for sine and cosine > > sin(w + deltaw) = sin(w) * cos(deltaw) + cos(w) * sin(deltaw) > > cos(w + deltaw) = cos(w) * cos(deltaw) � sin(w) * sin(deltaw) > > > > The tables required are: > > #1 one quadrant sin(w)/cos(w) table of 2^M entries > i don't understand what this table is for at all? tan(w)? what do you > do with it?That's one quadrant used for sine or cosine generation, the usual single lookup table.> > #2 sin(deltaw) table of 2^N entries > > #3 cos(deltaw) table of 2^N entries > > (#3 only if cosine is generated as well as sine) > > you need both tables, even if you only want to generate the sine, using > this method.Right you are. It is a three table method for either sine or cosine.> ... > if there is a smaller table with just a selection of > pertinent deltaw, then you might need a pair of tables for the reduced > number of entries. > > ... > r b-jBoth delta tables are required. They cover angles 0 to 0.5*pi/(2^M), that is, the interval between the entries in the coarse one quadrant lookup table used for both sine and cosine. It looks like Jason Betts and Uwe Hercksen have since posted suggesting the same approach. Dale B. Dalrymple

Reply by ●March 18, 20132013-03-18

Rickman, 1. I did a quarter-wave table once, and I put in the phase offset exactly like you said. That is, the 0th index wasn't sin(0), but rather something like sin(2*pi/(2*N)) where N is the number of elements in the table. I didn't find that in any references either. That also helps because since you don't have to store the value -1 in the ROM so you don't have to scale the whole thing to avoid having the extra bit you'd need to store the value -1. 2. I'm doing another table now, but I'm just doing a full-wave. If you are using a Xilinx, you can do a 1024*18 lookup in a single BRAM. (But it sounds like you may be using an Actel or something if you don't have any multipliers.) 3. To answer your question about the old days, I think they used CORDICs, which get you great accuracy with the fewest gates. But it does take you about a clock cycle per bit of resolution, which is time you probably don't have. They didn't have blockRAMs either. 4. Like Rob said, you already have the derivative of sine stored in the table, so you don't actually have to do a subtraction to find the slope. If you want more accuracy, you can do a 2nd-order Taylor, because you also have the 2nd derivative already in the table. However, this takes another multiplier. I think the Xilinx CoreGen core does this. 5. Make sure you're doing the kind of rounding you want. Sometimes something like Matlab does a round toward +inf when really you want a round away from zero. 6. I'm using VHDL now and am doing the whole thing in HDL. That is, I'm using the math and sfixed library to calculate the ROM values so I don't need Matlab or any preprocessing, which makes it all cleaner. I don't know if the Verilog math functions are supported by the tools yet, but you could still do the same thing with your own Maclaurin function. -Kevin

Reply by ●March 18, 20132013-03-18

Robert Baer wrote:> > glen herrmannsfeldt wrote: > > http://ia601506.us.archive.org/8/items/bitsavers_nationaldaMOSIntegratedCircuits_20716690/1972_National_MOS_Integrated_Circuits.pdf > > > > The description starts on page 273. > > > Problem with that..could see only front and back cover. > Different source perhaps?It works for me. Downloaded & opened with Adobe Acrobat X Ver.10.1.6 -- Politicians should only get paid if the budget is balanced, and there is enough left over to pay them.

Reply by ●March 18, 20132013-03-18

On 3/17/2013 8:03 PM, Tim Williams wrote:> "Jon Kirwan" <jonk@infinitefactors.org> wrote in message > news:r2ick8d61stgkekaelt9m5k4sqh0qc0sgm@4ax.com... >> You already know why. Just not wearing your hat right now. >> Two terms of taylors at 0 of cos(x) is x + x^2/2. Two at >> sin(x) is 1 + x. One is quadratic, one is linear. > > Oops, should be: > cos(x) ~= 1 - x^2/2 > sin(x) ~= x (- x^3/6) > Mixed up your odd/even powers :) > > Of course, the cubic is less important than the quadratic, so sin(x) ~= > x is sufficient for most purposes. > > I suppose you could potentially synthesize the functions using low power > approximations (like these) and interpolating between them over modulo > pi/4 segments. Probably wouldn't save many clock cycles relative to a > CORDIC, higher order polynomial, or other, for a given accuracy, so the > standard space-time tradeoff (LUT vs. computation) isn't changed.An in-between approach is quadratic interpolation. Rickman reads Forth. http://users.erols.com/jyavins/typek.htm may inspire him. (It's all fixed-point math.) ((As long as there is interpolation code anyway, an octant is sufficient.)) Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������

Reply by ●March 18, 20132013-03-18

On 3/17/2013 6:35 PM, rickman wrote:> On 3/17/2013 6:23 PM, langwadt@fonz.dk wrote: >> On Mar 17, 10:56 pm, rickman<gnu...@gmail.com> wrote: >>> On 3/16/2013 8:02 PM, langw...@fonz.dk wrote: >>> >>>> why not skip the lut and just do a full sine approximation >>> >>>> something like this, page 53-54 >>> >>>> http://www.emt.uni-linz.ac.at/education/Inhalte/pr_mderf/downloads/AD... >>>> >>> >>> I'm not sure this would be easier. The LUT an interpolation require a >>> reasonable amount of logic. This requires raising X to the 5th power >>> and five constant multiplies. My FPGA doesn't have multipliers and this >>> may be too much for the poor little chip. I suppose I could do some of >>> this with a LUT and linear interpolation... lol >>> >> >> I see, I assumed you had an MCU (or fpga) with multipliers >> >> considered CORDIC? > > I should learn that. I know it has been used to good effect in FPGAs. > Right now I am happy with the LUT, but I would like to learn more about > CORDIC. Any useful referrences?http://www.andraka.com/files/crdcsrvy.pdf should be a good start. Do you remember Ray Andraka? Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������

Reply by ●March 18, 20132013-03-18

On 18 Mar 2013 09:06:13 GMT, Jasen Betts <jasen@xnet.co.nz> wrote:>On 2013-03-18, Eric Jacobsen <eric.jacobsen@ieee.org> wrote: >> On Sun, 17 Mar 2013 18:35:37 -0400, rickman <gnuarm@gmail.com> wrote: > >> If you are implementing in an FPGA with no multipliers, then CORDIC is >> the first thing you should be looking at. > >?? CORDIC requires even more multiplies > > >But you can muliply two short lookup tables to simulate a longer lookup >table using he identity: > > sin(a+b) = sin(a)*cos(b) + cos(a)*sin(b) > > so you do a coarse lookup table with sin(a) (and you can get cos(a) from it >ti if a is some integer factor of 90 degrees) > > and do two fine lookup tables for sin(b) and cos(b) > > eg: you can do 16 bits of angle at 16 bit resoluton in 512 bytes of > table and ~80 lines of assembler code > (but the computation needs two multiplies)He indicated that his target is an FPGA that has no multiplies. This is the exact environment where the CORDIC excels. LUTs can be expensive in FPGAs depending on the vendor and the device. Almost any DSP process, including a large DFT, can be done in a LUT, it's just that nobody wants to build memories that big. It's all about the tradeoffs. FPGA with no multipliers -> CORDIC, unless there are big memories available in which case additional tradeoffs have to be considered. Most FPGAs that don't have multipliers also don't have a lot of memory for LUTS.> the first two bits determine the quadrant. > > the next 7 bits are on a coarse lookup table > 128 entries x 16 bits is 256 bytes > > next bit defines the sign of b (if set b is negative,add one to a) > > and the final 6 are the rest of b > > from a 64 entry table you can get sin(|b|) > and another 64 for cos(|b|) > > treat b=0 as a special case - have entries for 1..64 in the table > correct sin(|b|) for negative b > > a few branches and bit-shifts, two multiplies, one add and you have > your value.There are lots of ways to do folded-wave implementations for sine generation, e.g., DDS implementations, etc. Again, managing the tradeoffs with the available resources usually drives how to do get it done. Using a quarter-wave LUT with an accumulator has been around forever, so it's a pretty well-studied problem.> >-- >⚂⚃ 100% naturalEric Jacobsen Anchor Hill Communications http://www.anchorhill.com

Reply by ●March 18, 20132013-03-18

On Mon, 18 Mar 2013 10:42:56 -0700 (PDT), Kevin Neilson <kevin.neilson@xilinx.com> wrote:>Rickman, >1. I did a quarter-wave table once, and I put in the phase offset exactly = >like you said. That is, the 0th index wasn't sin(0), but rather something = >like sin(2*pi/(2*N)) where N is the number of elements in the table. I did= >n't find that in any references either. That also helps because since you = >don't have to store the value -1 in the ROM so you don't have to scale the = >whole thing to avoid having the extra bit you'd need to store the value -1.Yup. Another reason to not store 0 or +/-1 is that it can help minimize any DC bias in the generated waves.>2. I'm doing another table now, but I'm just doing a full-wave. If you ar= >e using a Xilinx, you can do a 1024*18 lookup in a single BRAM. (But it so= >unds like you may be using an Actel or something if you don't have any mult= >ipliers.) > >3. To answer your question about the old days, I think they used CORDICs, = >which get you great accuracy with the fewest gates. But it does take you a= >bout a clock cycle per bit of resolution, which is time you probably don't = >have. They didn't have blockRAMs either.That's the main downside to CORDICs. Given clock speeds these days it's generally easier to use algorithms that have some latency.>4. Like Rob said, you already have the derivative of sine stored in the ta= >ble, so you don't actually have to do a subtraction to find the slope. If = >you want more accuracy, you can do a 2nd-order Taylor, because you also hav= >e the 2nd derivative already in the table. However, this takes another mul= >tiplier. I think the Xilinx CoreGen core does this. > >5. Make sure you're doing the kind of rounding you want. Sometimes someth= >ing like Matlab does a round toward +inf when really you want a round away = >from zero. > >6. I'm using VHDL now and am doing the whole thing in HDL. That is, I'm u= >sing the math and sfixed library to calculate the ROM values so I don't nee= >d Matlab or any preprocessing, which makes it all cleaner. I don't know if= > the Verilog math functions are supported by the tools yet, but you could s= >till do the same thing with your own Maclaurin function. > >-KevinEric Jacobsen Anchor Hill Communications http://www.anchorhill.com

Reply by ●March 18, 20132013-03-18

On 3/17/2013 8:22 PM, John Larkin wrote:> On Sun, 17 Mar 2013 19:03:21 -0400, rickman<gnuarm@gmail.com> wrote: > >> On 3/17/2013 6:57 PM, John Larkin wrote: >>> On Sun, 17 Mar 2013 18:44:07 -0400, rickman<gnuarm@gmail.com> wrote: >>> >>>> On 3/17/2013 6:20 PM, John Larkin wrote: >>>>> >>>>> It would be interesting to distort the table entries to minimize THD. The >>>>> problem there is how to measure very low THD to close the loop. >>>> >>>> It could also be a difficult problem to solve without some sort of >>>> exhaustive search. Each point you fudge affects two curve segments and >>>> each curve segment is affected by two points. So there is some degree >>>> of interaction. >>>> >>>> Anyone know if there is a method to solve such a problem easily? I am >>>> sure that my current approach gets each segment end point to within �1 >>>> lsb and I suppose once you measure THD in some way it would not be an >>>> excessive amount of work to tune the 256 end points in a few passes >>>> through. This sounds like a tree search with pruning. >>> >>> If I could measure the THD accurately, I'd just blast in a corrective harmonic >>> adder to the table for each of the harmonics. That could be scientific or just >>> iterative. For the 2nd harmonic, for example, add a 2nd harmonic sine component >>> into all the table points, many of which wouldn't even change by an LSB, to null >>> out the observed distortion. Gain and phase, of course. >> >> I don't think you are clear on this. The table and linear interp are as >> good as they can get with the exception of one or possibly two lsbs to >> minimize the error in the linear interp. These is no way to correct for >> this by adding a second harmonic, etc. Remember, this is not a pure >> table lookup, it is a two step approach. > > Why can't some harmonic component be added to the table points to improve > downstream distortion? Of course, you'd need a full 360 degree table, not a > folded one.Mostly because the table is already optimal, unless you want to correct for other components.>>>> I'm assuming there would be no practical closed form solution. But >>>> couldn't the THD be calculated in a simulation rather than having to be >>>> measured on the bench? >>> >>> Our experience is that the output amplifiers, after the DAC and lowpass filter, >>> are the nasty guys for distortion, at least in the 10s of MHz. Lots of >>> commercial RF generators have amazing 2nd harmonic specs, like -20 dBc. A table >>> correction would unfortunately be amplitude dependent, so it gets messy fast. >> >> Ok, if you are compensating for the rest of the hardware, that's a >> different matter... > > For everything, including the DAC, but especially downstream stuff. The amps and > stuff may have 100x more distortion than the lookup table does.Ok, then you can add that to the LUT. As you say, it will require a full 360 degree table. This will have a limit on the order of the harmonics you can apply, but I expect the higher order harmonics would have very, very small coefficients anyway. -- Rick