comp.dsp | DDS LUT Size Calculation| page 2

Reply by rickman ●February 7, 20172017-02-07

On 2/7/2017 9:35 PM, Cedron wrote:
>> I just wrote a program to calculate the size of the LUT required to
>> generate the coarse sin/cos values (a in the following equation) to use
>> the trig identity sin(a+b) = sin(a)cos(b) + cos(a)sin(b), meaning the
>> max value of b is sufficiently small that cos(b) is 1.0 to within one
>> lsb of the output sine word.
>>
>> The size of a and b (meaning the number of bits) always turns out to be
>> half of the total.  Obviously there is something guiding this.  But I
>> don't know of any trig rules that would make it so.
>>
>> --
>>
>> Rick C
>
> Hi Rick,
>
> I'm going to take a stab here.
>
> The Taylor's series for cosine is
>
> cos( b ) ~=~ 1 - b^2/2 + b^4/4! - ...
>
> So if b is very small then
>
> 1 - cos( b ) ~=~ b^2/2
>
> b takes roughly half a many bits as b^2.  The division by two will alter
> the bit count by one.
>
> Hope this is even close and it helps.

Yes, that is a good way to look at it.  Thanks.


-- 

Rick C

Reply by robert bristow-johnson ●February 8, 20172017-02-08

On Tuesday, February 7, 2017 at 9:26:35 PM UTC-5, rickman wrote:
> On 2/7/2017 3:42 PM, Steve Pope wrote:
> > dbd  <d.dalrymple@sbcglobal.net> wrote:
> >
> >> There must be further unstated requirements or assumptions because, in
> >> general, the size of the lookup table addresses and the value of the lsb
> >> of the looked up data are independent. You might get different results
> >>from 16 bit fixed point and 64 bit float data.
> >
> > I agree, I am curious as to what exactly is being observed here.
> 
> What part don't you understand?
> 

i understand what an LUT is.

i understand quantization error in forming the index into an LUT.

i understand how simply rounding to the nearest entry (or nearest integer index) is the same as convolving the values of the LUT with a rectangular pulse function.

i understand how linear interpolation is the same as convolving the values of the LUT with a triangular pulse function.

i understand from either how to compute the maximum error.

i understand that so far, i have said nothing about the number of bits of either the input or output word (which is another source of quantization error).

but this is about figuring out your error as a function of the number of entries in the LUT, is it not?

i don't understand what your issues are and i will not read the FORTH program.

r b-j

Reply by rickman ●February 8, 20172017-02-08

On 2/7/2017 11:33 PM, robert bristow-johnson wrote:
> On Tuesday, February 7, 2017 at 9:26:35 PM UTC-5, rickman wrote:
>> On 2/7/2017 3:42 PM, Steve Pope wrote:
>>> dbd  <d.dalrymple@sbcglobal.net> wrote:
>>>
>>>> There must be further unstated requirements or assumptions because, in
>>>> general, the size of the lookup table addresses and the value of the lsb
>>>> of the looked up data are independent. You might get different results
>>> >from 16 bit fixed point and 64 bit float data.
>>>
>>> I agree, I am curious as to what exactly is being observed here.
>>
>> What part don't you understand?
>>
>
> i understand what an LUT is.
>
> i understand quantization error in forming the index into an LUT.
>
> i understand how simply rounding to the nearest entry (or nearest integer index) is the same as convolving the values of the LUT with a rectangular pulse function.
>
> i understand how linear interpolation is the same as convolving the values of the LUT with a triangular pulse function.
>
> i understand from either how to compute the maximum error.
>
> i understand that so far, i have said nothing about the number of bits of either the input or output word (which is another source of quantization error).
>
> but this is about figuring out your error as a function of the number of entries in the LUT, is it not?
>
> i don't understand what your issues are and i will not read the FORTH program.

Yeah, I don't' expect anyone to read the program.  It's there in case 
someone was interested in the language.  This is a good example that 
shows how useful and easy to use the language is.  BTW, reading the 
Forth program in this case is no harder than looking at the keystrokes 
used to perform the calculation on an RPN calculator with some function 
keys defined.

After reading the list of things you understand, I don't understand what 
you don't understand.  I think you are making this more complex than it 
needs to be.  I'll say you should read my other post in this thread 
where I draw a diagram of the hardware doing the calculation.

The calculation is using the trig identity, sin(a+b) = sin(a)cos(b) + 
cos(a)sin(b).  a and b represent the msbs and lsbs respectively, of the 
phase word used to calculate the sine value.  The crux of the issue is 
to determine the minimum size of a that meets an accuracy requirement 
set by the output word size.  The size of a in turn determines the size 
of the LUT which is important in determining the practicality for a 
given application.

The trig function is reduced by assuming cos(b) is 1.0.  To meet the 
accuracy requirement this restricts the max size of b.  That is the 
calculation I was doing.  How many bits can be allocated to b with the 
remaining phase word bits being a for a given error?

Turns out the problem simplifies because in the region of 0 phase, 
cos(b) is essentially quadratic.  Approximately half the phase angle 
bits need to used to address the LUT (a) and half used for the remaining 
term (b).  This gives just a bit less than the required accuracy in the 
result from the LUT.  An extra bit is gained however, because the output 
from the LUT is unsigned and the end result is signed.  The final error 
contribution from the LUT is halved.

This got started because someone in another group seems to think only 
the CORDIC can give "exact" results in a "small" amount of logic.  The 
CORDIC is claimed to be "small" because it has no multiplies.  I pointed 
out the CORDIC algorithm is essentially the same complexity as a 
multiply, order (N^2).  Given that multipliers are available in most 
FPGAs (or a multiply instruction in CPUs) I couldn't understand why 
CORDIC was still used.  I found that person is producing a 32 bit result 
which is not so easy with a LUT-multiply approach.  So I wanted to 
quantify it.

Once I got the result of "half the word width" for the a and b 
quantities, I realized I went down this road some five or six years ago 
and arrived at the same result.  I'm not sure I realized then the reason 
for the result, the quadratic nature of sine near 90 degrees.

I must be getting old.  Hopefully I won't forget this again.  I expect 
to be designing a DDS in the near future.

-- 

Rick C

Reply by Cedron ●February 8, 20172017-02-08

>Once I got the result of "half the word width" for the a and b 
>quantities, I realized I went down this road some five or six years ago 
>and arrived at the same result.  I'm not sure I realized then the reason

>for the result, the quadratic nature of sine near 90 degrees.
>
>I must be getting old.  Hopefully I won't forget this again.  I expect 
>to be designing a DDS in the near future.
>
>-- 
>
>
>Rick C


In general, Taylor's series expansion is:

f(x+d) = f(x) + f'(x)*d + f"(x)*d^2/2 + ...

In the case of the Sine function it would be:

sin(x+d) = sin(x) + cos(x)*d - sin(x)*d^2/2 - .....

So your approximation is equivalent to a Taylor series truncated at the
linear term.

You can also see that when x = pi/2

sin(pi/2+d) ~=~ 1 - d^2/2

May I suggest you make a change to your methodology that will improve your
accuracy marginally.

Let w = pi/2 / N be your wedge width (pizza slice).

lookup1[i] = sin( i*w ) for i=0 to N

lookup2[i] = lookup1[i+1] - lookup1[i] for i=0 to N-1

Then when you get an argument value split it into an integer value and
fraction

v = i + f

sin( v ) ~=~ lookup1[i] + lookup2[i]*f

The amount of calculations in the two approaches are the same, but the
results are slightly different.  Your approach can be thought of as
following a tangent line at the lower point of the arc of your wedge and
my suggest approach can be thought of as following a line segment that
leads directly to the next point.

I don't think the difference is significant enough to alter your bit
counts.

Ced

---------------------------------------
Posted through http://www.DSPRelated.com

Reply by Cedron ●February 8, 20172017-02-08

>
>v = i + f
>
>sin( v ) ~=~ lookup1[i] + lookup2[i]*f
>

I shouldn't have used "sin" there, since it is the rescaled version.

A more accurate rendering would be:

v = angle / w  

v = i + f

sin( angle ) ~=~ lookup1[i] + lookup2[i]*f

Sorry about that.

Ced
---------------------------------------
Posted through http://www.DSPRelated.com

Reply by Scott Hemphill ●February 8, 20172017-02-08

rickman <gnuarm@gmail.com> writes:

> Does the curve of the sine function approach x^2 at 90 degrees?

sin(x-90&deg;) = -cos(x)

Since cos(x) ~= 1 - x^2/2, sin(x-90&deg;) ~= x^2/2 - 1

Scott
-- 
Scott Hemphill	hemphill@alumni.caltech.edu
"This isn't flying.  This is falling, with style."  -- Buzz Lightyear

Reply by rickman ●February 8, 20172017-02-08

On 2/8/2017 10:18 AM, Cedron wrote:
>> Once I got the result of "half the word width" for the a and b
>> quantities, I realized I went down this road some five or six years ago
>> and arrived at the same result.  I'm not sure I realized then the reason
>
>> for the result, the quadratic nature of sine near 90 degrees.
>>
>> I must be getting old.  Hopefully I won't forget this again.  I expect
>> to be designing a DDS in the near future.
>>
>> --
>>
>>
>> Rick C
>
>
> In general, Taylor's series expansion is:
>
> f(x+d) = f(x) + f'(x)*d + f"(x)*d^2/2 + ...
>
> In the case of the Sine function it would be:
>
> sin(x+d) = sin(x) + cos(x)*d - sin(x)*d^2/2 - .....
>
> So your approximation is equivalent to a Taylor series truncated at the
> linear term.
>
> You can also see that when x = pi/2
>
> sin(pi/2+d) ~=~ 1 - d^2/2
>
> May I suggest you make a change to your methodology that will improve your
> accuracy marginally.
>
> Let w = pi/2 / N be your wedge width (pizza slice).
>
> lookup1[i] = sin( i*w ) for i=0 to N
>
> lookup2[i] = lookup1[i+1] - lookup1[i] for i=0 to N-1
>
> Then when you get an argument value split it into an integer value and
> fraction
>
> v = i + f
>
> sin( v ) ~=~ lookup1[i] + lookup2[i]*f
>
> The amount of calculations in the two approaches are the same, but the
> results are slightly different.  Your approach can be thought of as
> following a tangent line at the lower point of the arc of your wedge and
> my suggest approach can be thought of as following a line segment that
> leads directly to the next point.
>
> I don't think the difference is significant enough to alter your bit
> counts.

I'm not sure there *is* much of a difference and I'm pretty sure your 
method is not necessarily optimal.

You are assuming that using the slope of the chord is a better fit to 
the curve than the tangent.  It would clearly be a better fit if the end 
points are adjusted so the chord was crossing the circle.  In other 
words, use a chord of a circle slightly larger than the unit circle. 
Without this correction I expect the difference between the two is very, 
very small with the winner depending on how you measure it (area under 
the curve, least squares, etc.).

I will also point out that over short distances the linear approximation 
is *very* good regardless of the choice of the two.  At low angles the 
sine function is very linear while at high angles the distance for the 
interpolation is very small diminishing the effect of any errors.

-- 

Rick C

Reply by ●February 8, 20172017-02-08

On Wed, 8 Feb 2017 00:43:27 -0500, rickman <gnuarm@gmail.com> wrote:

>On 2/7/2017 11:33 PM, robert bristow-johnson wrote:
>> On Tuesday, February 7, 2017 at 9:26:35 PM UTC-5, rickman wrote:
>>> On 2/7/2017 3:42 PM, Steve Pope wrote:
>>>> dbd  <d.dalrymple@sbcglobal.net> wrote:
>>>>
>>>>> There must be further unstated requirements or assumptions because, in
>>>>> general, the size of the lookup table addresses and the value of the lsb
>>>>> of the looked up data are independent. You might get different results
>>>> >from 16 bit fixed point and 64 bit float data.
>>>>
>>>> I agree, I am curious as to what exactly is being observed here.
>>>
>>> What part don't you understand?
>>>
>>
>> i understand what an LUT is.
>>
>> i understand quantization error in forming the index into an LUT.
>>
>> i understand how simply rounding to the nearest entry (or nearest integer index) is the same as convolving the values of the LUT with a rectangular pulse function.
>>
>> i understand how linear interpolation is the same as convolving the values of the LUT with a triangular pulse function.
>>
>> i understand from either how to compute the maximum error.
>>
>> i understand that so far, i have said nothing about the number of bits of either the input or output word (which is another source of quantization error).
>>
>> but this is about figuring out your error as a function of the number of entries in the LUT, is it not?
>>
>> i don't understand what your issues are and i will not read the FORTH program.
>
>Yeah, I don't' expect anyone to read the program.  It's there in case 
>someone was interested in the language.  This is a good example that 
>shows how useful and easy to use the language is.  BTW, reading the 
>Forth program in this case is no harder than looking at the keystrokes 
>used to perform the calculation on an RPN calculator with some function 
>keys defined.
>
>After reading the list of things you understand, I don't understand what 
>you don't understand.  I think you are making this more complex than it 
>needs to be.  I'll say you should read my other post in this thread 
>where I draw a diagram of the hardware doing the calculation.
>
>The calculation is using the trig identity, sin(a+b) = sin(a)cos(b) + 
>cos(a)sin(b).  a and b represent the msbs and lsbs respectively, of the 
>phase word used to calculate the sine value.  The crux of the issue is 
>to determine the minimum size of a that meets an accuracy requirement 
>set by the output word size.  The size of a in turn determines the size 
>of the LUT which is important in determining the practicality for a 
>given application.
>
>The trig function is reduced by assuming cos(b) is 1.0.  To meet the 
>accuracy requirement this restricts the max size of b.  That is the 
>calculation I was doing.  How many bits can be allocated to b with the 
>remaining phase word bits being a for a given error?
>
>Turns out the problem simplifies because in the region of 0 phase, 
>cos(b) is essentially quadratic.  Approximately half the phase angle 
>bits need to used to address the LUT (a) and half used for the remaining 
>term (b).  This gives just a bit less than the required accuracy in the 
>result from the LUT.  An extra bit is gained however, because the output 
>from the LUT is unsigned and the end result is signed.  The final error 
>contribution from the LUT is halved.
>
>This got started because someone in another group seems to think only 
>the CORDIC can give "exact" results in a "small" amount of logic.  The 
>CORDIC is claimed to be "small" because it has no multiplies.  I pointed 
>out the CORDIC algorithm is essentially the same complexity as a 
>multiply, order (N^2).  Given that multipliers are available in most 
>FPGAs (or a multiply instruction in CPUs) I couldn't understand why 
>CORDIC was still used.  I found that person is producing a 32 bit result 
>which is not so easy with a LUT-multiply approach.  So I wanted to 
>quantify it.

I've been puzzled sometimes when people cling to CORDIC algorithms
when multipliers are available, although there are some niche
applications where it seems to make sense.

>Once I got the result of "half the word width" for the a and b 
>quantities, I realized I went down this road some five or six years ago 
>and arrived at the same result.  I'm not sure I realized then the reason 
>for the result, the quadratic nature of sine near 90 degrees.

I seem to recall this result from back in the days when making a good
DDS for comm synchronization was a fresh enough thing that it was
worth cooking up your own stuff to improve performance.   These days
most of the tricks are known or it's easy enough to get good
performance just by throwing word-width complexity at it.



---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Reply by rickman ●February 8, 20172017-02-08

On 2/8/2017 4:59 PM, eric.jacobsen@ieee.org wrote:
> On Wed, 8 Feb 2017 00:43:27 -0500, rickman <gnuarm@gmail.com> wrote:
>
>> On 2/7/2017 11:33 PM, robert bristow-johnson wrote:
>>> On Tuesday, February 7, 2017 at 9:26:35 PM UTC-5, rickman wrote:
>>>> On 2/7/2017 3:42 PM, Steve Pope wrote:
>>>>> dbd  <d.dalrymple@sbcglobal.net> wrote:
>>>>>
>>>>>> There must be further unstated requirements or assumptions because, in
>>>>>> general, the size of the lookup table addresses and the value of the lsb
>>>>>> of the looked up data are independent. You might get different results
>>>>> >from 16 bit fixed point and 64 bit float data.
>>>>>
>>>>> I agree, I am curious as to what exactly is being observed here.
>>>>
>>>> What part don't you understand?
>>>>
>>>
>>> i understand what an LUT is.
>>>
>>> i understand quantization error in forming the index into an LUT.
>>>
>>> i understand how simply rounding to the nearest entry (or nearest integer index) is the same as convolving the values of the LUT with a rectangular pulse function.
>>>
>>> i understand how linear interpolation is the same as convolving the values of the LUT with a triangular pulse function.
>>>
>>> i understand from either how to compute the maximum error.
>>>
>>> i understand that so far, i have said nothing about the number of bits of either the input or output word (which is another source of quantization error).
>>>
>>> but this is about figuring out your error as a function of the number of entries in the LUT, is it not?
>>>
>>> i don't understand what your issues are and i will not read the FORTH program.
>>
>> Yeah, I don't' expect anyone to read the program.  It's there in case
>> someone was interested in the language.  This is a good example that
>> shows how useful and easy to use the language is.  BTW, reading the
>> Forth program in this case is no harder than looking at the keystrokes
>> used to perform the calculation on an RPN calculator with some function
>> keys defined.
>>
>> After reading the list of things you understand, I don't understand what
>> you don't understand.  I think you are making this more complex than it
>> needs to be.  I'll say you should read my other post in this thread
>> where I draw a diagram of the hardware doing the calculation.
>>
>> The calculation is using the trig identity, sin(a+b) = sin(a)cos(b) +
>> cos(a)sin(b).  a and b represent the msbs and lsbs respectively, of the
>> phase word used to calculate the sine value.  The crux of the issue is
>> to determine the minimum size of a that meets an accuracy requirement
>> set by the output word size.  The size of a in turn determines the size
>> of the LUT which is important in determining the practicality for a
>> given application.
>>
>> The trig function is reduced by assuming cos(b) is 1.0.  To meet the
>> accuracy requirement this restricts the max size of b.  That is the
>> calculation I was doing.  How many bits can be allocated to b with the
>> remaining phase word bits being a for a given error?
>>
>> Turns out the problem simplifies because in the region of 0 phase,
>> cos(b) is essentially quadratic.  Approximately half the phase angle
>> bits need to used to address the LUT (a) and half used for the remaining
>> term (b).  This gives just a bit less than the required accuracy in the
>> result from the LUT.  An extra bit is gained however, because the output
>>from the LUT is unsigned and the end result is signed.  The final error
>> contribution from the LUT is halved.
>>
>> This got started because someone in another group seems to think only
>> the CORDIC can give "exact" results in a "small" amount of logic.  The
>> CORDIC is claimed to be "small" because it has no multiplies.  I pointed
>> out the CORDIC algorithm is essentially the same complexity as a
>> multiply, order (N^2).  Given that multipliers are available in most
>> FPGAs (or a multiply instruction in CPUs) I couldn't understand why
>> CORDIC was still used.  I found that person is producing a 32 bit result
>> which is not so easy with a LUT-multiply approach.  So I wanted to
>> quantify it.
>
> I've been puzzled sometimes when people cling to CORDIC algorithms
> when multipliers are available, although there are some niche
> applications where it seems to make sense.
>
>> Once I got the result of "half the word width" for the a and b
>> quantities, I realized I went down this road some five or six years ago
>> and arrived at the same result.  I'm not sure I realized then the reason
>> for the result, the quadratic nature of sine near 90 degrees.
>
> I seem to recall this result from back in the days when making a good
> DDS for comm synchronization was a fresh enough thing that it was
> worth cooking up your own stuff to improve performance.   These days
> most of the tricks are known or it's easy enough to get good
> performance just by throwing word-width complexity at it.

Most DDS resolutions are limited by the resolution of the analog 
converters, either input or output.  The person I was discussing this 
indicated they use 32 bit sines but did not give the use case.  At 32 
bits you are making some large LUTs and CORDIC may well be the better 
solution unless you have off chip memory that can keep up with your data 
rates.

-- 

Rick C

Reply by Cedron ●February 8, 20172017-02-08

>
>I'm not sure there *is* much of a difference and I'm pretty sure your 
>method is not necessarily optimal.
>
>You are assuming that using the slope of the chord is a better fit to 
>the curve than the tangent.  It would clearly be a better fit if the end

>points are adjusted so the chord was crossing the circle.  In other 
>words, use a chord of a circle slightly larger than the unit circle. 
>Without this correction I expect the difference between the two is very,

>very small with the winner depending on how you measure it (area under 
>the curve, least squares, etc.).
>
>I will also point out that over short distances the linear approximation

>is *very* good regardless of the choice of the two.  At low angles the 
>sine function is very linear while at high angles the distance for the 
>interpolation is very small diminishing the effect of any errors.
>
>-- 
>
>Rick C

I agree with everything you said, sort of.

I never said "optimal", I said "better".

It more accurate to consider the tangent and chord on a Sine function
graph, rather than the unit circle.

Which of these two alternatives would be better under any reasonable
metric on any part of the graph depends on the domain of your fractional
portion.

If you are taking the floor of your value and the fraction portion goes
from zero to one, the chord method (my suggested one) would be better. 
Particularly in the .5 to .999999.... range.

If you are taking the nearest integer to your value and the fraction
portion ranges from -.5 to .5, then using the tangent line is better.

Whether the differences in the two approaches even impact the least
significant bit in your implementation could be determined with a simple
test program.  Good luck with your project no matter how you decide to
proceed.

Ced
---------------------------------------
Posted through http://www.DSPRelated.com

Previous 123 4 5 6 Next

DDS LUT Size Calculation

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group