Hi,all I've tried to read the asm code of Sin&Cos functions, but didn't really understand anything there. I am puzzled by this problem. Thank you very much for your help. the asm code from TI TMS320F28335 is fast,using table look-up and Taylor series expansion between the look up table entries. but I don't understand. (1),(2) Taylor series expansion at point x0 sinx=sinx0 + conx0*(x-x0) - (1/2)*sinx0*(x-x0)^2 - (1/3!)*conx0*(x-x0)^3+... (1) cosx=cosx0 - sinx0*(x-x0) - (1/2)*cosx0*(x-x0)^2 + (1/3!)*sinx0*(x-x0)^3+... (2) _IQsinTable: .long 0 ; sin( 2*pi* 0/512 ) = 0.000000000000 in Q30 .long 13176464 ; sin( 2*pi* 1/512 ) = 0.012271538286 in Q30 ..... _IQcosTable: .long 1073741824 ; sin( 2*pi* 128/512 ) = 1.000000000000 in Q30 .long 1073660973 ; sin( 2*pi* 129/512 ) = 0.999924701839 in Q30 ..... .long 1073741824 ; sin( 2*pi* 640/512 ) = 1.000000000000 in Q30 _IQcosTableEnd: _IQ24cosPU ;(Angle is represented by per Unit ,lying in acc ) MOV *SP++,#0x3F6B ;PI,Q24 MOV *SP++,#0x0324 MOVL XAR6,#0x3FE100 ;XAR6 pointed to _IQcosTable MOVL XAR7,#0x3FE000 ;XAR7 pointed to _IQsinTable SETC OVM ABS ACC ;;Absolute value of Angle MPYB P,T,#0 ASR64 ACC:P,15 AND @AL,#0x01FF LSL AL,1 ;pointer to one of table entries MOVZ AR0,@AL MOVL XT,*--SP QMPYUL P,XT,@P MOVB ACC,#0 MOVL XT,@P MOVL XAR4,*+XAR6[AR0] MOVL XAR5,*+XAR7[AR0] SUBL ACC,@XAR4 ;-sin@->ACC ASR64 ACC:P,1 ;Q value changed from Q30 to Q29 QMPYL P,XT,@XAR5 CLRC OVM MPY P,@PH,#10922 ; following sections ,I'm puzzled ADDL ACC,@P QMPYL ACC,XT,@ACC SUBL ACC,@XAR5 QMPYL ACC,XT,@ACC ADDL ACC,@XAR4 ASR64 ACC:P,6 ;Q value changed from Q30 to Q24 LRETR
Sin&Cos functions implement
Started by ●November 8, 2008
Reply by ●November 8, 20082008-11-08
wdhxy wrote:> Hi,all > I've tried to read the asm code of Sin&Cos functions, but > didn't really understand anything there. I am puzzled by this problem. > > Thank you very much for your help.What don't you understand the algorithm and underlying equations, ot the assembly-code implementation? Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Reply by ●November 8, 20082008-11-08
On Sat, 08 Nov 2008 20:57:57 -0500, Jerry Avins wrote:> wdhxy wrote: >> Hi,all >> I've tried to read the asm code of Sin&Cos functions, but didn't really >> understand anything there. I am puzzled by this problem. >> >> Thank you very much for your help. > > What don't you understand the algorithm and underlying equations, ot the > assembly-code implementation? > > Jerry(Jerry always gets there first with the on-the-spot questions). Note that the Taylor's expansion isn't the absolute bee's knees if you have unlimited memory space: In that case you can trim a few clock cycles with a best-fit polynomial that inherently corrects for the error of truncating a Taylor's series. I strongly suspect that it's more trouble than it's worth for almost all instances, but it's fun to keep in mind. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com Do you need to implement control loops in software? "Applied Control Theory for Embedded Systems" gives you just what it says. See details at http://www.wescottdesign.com/actfes/actfes.html
Reply by ●November 9, 20082008-11-09
Tim Wescott wrote:> On Sat, 08 Nov 2008 20:57:57 -0500, Jerry Avins wrote: > >> wdhxy wrote: >>> Hi,all >>> I've tried to read the asm code of Sin&Cos functions, but didn't really >>> understand anything there. I am puzzled by this problem. >>> >>> Thank you very much for your help. >> What don't you understand the algorithm and underlying equations, ot the >> assembly-code implementation? >> >> Jerry > > (Jerry always gets there first with the on-the-spot questions). > > Note that the Taylor's expansion isn't the absolute bee's knees if you > have unlimited memory space: In that case you can trim a few clock > cycles with a best-fit polynomial that inherently corrects for the error > of truncating a Taylor's series. > > I strongly suspect that it's more trouble than it's worth for almost all > instances, but it's fun to keep in mind.I've found that most of the time, simple* quadratic interpolation into a table is a good trade-off of table size, accuracy, and computation speed. There's an example at http://users.erols.com/jyavins/typek.htm Jerry __________________________________ * One can choose the interpolation coefficients for minimum error (mean squared or peak) and one can match the slopes at the endpoints, but the simplest approach, making the endpoints and centers of the intervals exact is mathematically far simpler, and usually quite adequate. -- Engineering is the art of making what you want from things you can get. ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
Reply by ●November 9, 20082008-11-09
On Nov 8, 8:27�pm, Jerry Avins <j...@ieee.org> wrote:> Tim Wescott wrote: > > On Sat, 08 Nov 2008 20:57:57 -0500, Jerry Avins wrote: > > >> wdhxy wrote: > >>> Hi,all > >>> I've tried to read the asm code of Sin&Cos functions, but didn't really > >>> understand anything there. I am puzzled by this problem. > > >>> Thank you very much for your help. > >> What don't you understand the algorithm and underlying equations, ot the > >> assembly-code implementation? > > >> Jerry > > > (Jerry always gets there first with the on-the-spot questions). > > > Note that the Taylor's expansion isn't the absolute bee's knees if you > > have unlimited memory space: �In that case you can trim a few clock > > cycles with a best-fit polynomial that inherently corrects for the error > > of truncating a Taylor's series. > > > I strongly suspect that it's more trouble than it's worth for almost all > > instances, but it's fun to keep in mind. > > I've found that most of the time, simple* quadratic interpolation into a > table is a good trade-off of table size, accuracy, and computation > speed. There's an example athttp://users.erols.com/jyavins/typek.htm > > Jerry > __________________________________ > * One can choose the interpolation coefficients for minimum error (mean > squared or peak) and one can match the slopes at the endpoints, but the > simplest approach, making the endpoints and centers of the intervals > exact is mathematically far simpler, and usually quite adequate. > -- > Engineering is the art of making what you want from things you can get. > �����������������������������������������������������������������������- Hide quoted text - > > - Show quoted text -We simply use linear-interpolation. It is easy to achieve 100dB+ SNR.
Reply by ●November 9, 20082008-11-09
DigitalSignal wrote:> > We simply use linear-interpolation. It is easy to achieve 100dB+ SNR.If you use a LUT of 256 entries for one quadrant of sine, and optimize the LUT for minumum error with the linear interpolation, the max. error is going to be about 3.7e-5, which is slightly less then 16 bit accuracy. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
Reply by ●November 9, 20082008-11-09
DigitalSignal wrote:> On Nov 8, 8:27 pm, Jerry Avins <j...@ieee.org> wrote: >> Tim Wescott wrote: >>> On Sat, 08 Nov 2008 20:57:57 -0500, Jerry Avins wrote: >>>> wdhxy wrote: >>>>> Hi,all >>>>> I've tried to read the asm code of Sin&Cos functions, but didn't really >>>>> understand anything there. I am puzzled by this problem. >>>>> Thank you very much for your help. >>>> What don't you understand the algorithm and underlying equations, ot the >>>> assembly-code implementation? >>>> Jerry >>> (Jerry always gets there first with the on-the-spot questions). >>> Note that the Taylor's expansion isn't the absolute bee's knees if you >>> have unlimited memory space: In that case you can trim a few clock >>> cycles with a best-fit polynomial that inherently corrects for the error >>> of truncating a Taylor's series. >>> I strongly suspect that it's more trouble than it's worth for almost all >>> instances, but it's fun to keep in mind. >> I've found that most of the time, simple* quadratic interpolation into a >> table is a good trade-off of table size, accuracy, and computation >> speed. There's an example athttp://users.erols.com/jyavins/typek.htm >> >> Jerry >> __________________________________ >> * One can choose the interpolation coefficients for minimum error (mean >> squared or peak) and one can match the slopes at the endpoints, but the >> simplest approach, making the endpoints and centers of the intervals >> exact is mathematically far simpler, and usually quite adequate. >> -- >> Engineering is the art of making what you want from things you can get. >> �����������������������������������������������������������������������- Hide quoted text - >> >> - Show quoted text - > > We simply use linear-interpolation. It is easy to achieve 100dB+ SNR.You need more segments with linear interpolation. By storing an extra term that defines the segments' curvatures, many fewer segments are needed. Extra computation at run time vs. a smaller LUT. Did you see that a LUT of only 32 parabolic segments (33 end points) gives sines of 1% accuracy over 360 degrees? 32 parabolic segments over 90 degrees yields about 0.2% but requires quadrant manipulation. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Reply by ●November 10, 20082008-11-10
> If you use a LUT of 256 entries for one quadrant of sine, and optimize > the LUT for minumum error with the linear interpolation, the max. error > is going to be about 3.7e-5, which is slightly less then 16 bit accuracy.I'm not suggesting an FPGA implementation to the OP, but here is what it might look like. Xilinx's value range of FPGA's have 18kbit block rams spread throughout the FPGA fabric. A single block ram could do 256x 36bits (and 512x36bits of course). Presumably 36 bits per LUT entry is excessive?, so it might be more accurate to go 1024x18? Plus a little bit of logic to generate the right address depending on the quadrant. The rams are dual ported so you get two channels out of the one block ram. And this could be clocked at >200MHz. I.e. sample rate >200MHz. A Xilinx spartan 3 1800 A DSP has 84 such block rams, and costs <30$ in volume. Alternatively, if you want something implemented in the FPGA logic fabric itself, Jerry's one might be more suited to that. A 32 entry look up table, with n-bit words, can be done with 2n logic cells. So say if we need 10 bit words, then that is 20 logic cells (plus something for the address generator). Not quite sure how the linear interpolation works - A spartan 3 1800A has 33 thousand logic cells. Again, can be clocked at >200MHz
Reply by ●November 10, 20082008-11-10
Bugger, I hit send accidentally. This time same post, but finished properly!. I'm not suggesting an FPGA implementation to the OP, but here is what it might look like. Xilinx's value range of FPGA's have 18kbit block rams spread throughout the FPGA fabric. A single block ram could do 256x 36bits (and 512x36bits of course). Presumably 36 bits per LUT entry is excessive?, so it might be more accurate to go 1024x18? Plus a little bit of logic to generate the right address depending on the quadrant. The rams are dual ported so you get two channels out of the one block ram. And this could be clocked at >200MHz. I.e. sample rate >200MHz. A Xilinx spartan 3 1800 A DSP has 84 such block rams, and costs <30$ in volume. Alternatively, if you want something implemented in the FPGA logic fabric itself, Jerry's one might be more suited to that. A 32 entry look up table, with n-bit words, can be done with 2n logic cells. So say if we need 10 bit words, then that is 20 logic cells (plus something for the address generator). Not quite sure how the linear interpolation works - some sort of add logic required as well presumably? A spartan 3 1800A has 33 thousand logic cells. Again, can be clocked at >200MHz. Ignoring the number of entries in the lookup table, what effect does the lookup table wordlength have? Sure it must degrade the accuracy, but by how much compared to the number of lookup table entries. Cheers Andrew
Reply by ●November 10, 20082008-11-10
>Hi,all >I've tried to read the asm code of Sin&Cos functions, but >didn't really understand anything there. I am puzzled by this problem. > >Thank you very much for your help. > > >the asm code from TI TMS320F28335 is fast,using table look-up and Taylor >series expansion between the look up >table entries. >but I don't understand. >(1),(2) Taylor series expansion at point x0 >sinx=sinx0 + conx0*(x-x0) - (1/2)*sinx0*(x-x0)^2 - >(1/3!)*conx0*(x-x0)^3+... (1) >cosx=cosx0 - sinx0*(x-x0) - (1/2)*cosx0*(x-x0)^2 + >(1/3!)*sinx0*(x-x0)^3+... (2) > > >_IQsinTable: > .long 0 ; sin( 2*pi* 0/512 ) = 0.000000000000 in Q30 > .long 13176464 ; sin( 2*pi* 1/512 ) = 0.012271538286 in Q30 > >..... > >_IQcosTable: > .long 1073741824 ; sin( 2*pi* 128/512 ) = 1.000000000000 in Q30 > .long 1073660973 ; sin( 2*pi* 129/512 ) = 0.999924701839 in Q30 >..... > .long 1073741824 ; sin( 2*pi* 640/512 ) = 1.000000000000 in Q30 >_IQcosTableEnd: > > > >_IQ24cosPU ;(Angle is represented by per Unit ,lying in acc ) >MOV *SP++,#0x3F6B ;PI,Q24 >MOV *SP++,#0x0324 >MOVL XAR6,#0x3FE100 ;XAR6 pointed to _IQcosTable >MOVL XAR7,#0x3FE000 ;XAR7 pointed to _IQsinTable >SETC OVM >ABS ACC ;;Absolute value of Angle >MPYB P,T,#0 >ASR64 ACC:P,15 >AND @AL,#0x01FF >LSL AL,1 ;pointer to one of table entries >MOVZ AR0,@AL >MOVL XT,*--SP >QMPYUL P,XT,@P >MOVB ACC,#0 >MOVL XT,@P >MOVL XAR4,*+XAR6[AR0] >MOVL XAR5,*+XAR7[AR0] >SUBL ACC,@XAR4 ;-sin@->ACC >ASR64 ACC:P,1 ;Q value changed from Q30 to Q29 >QMPYL P,XT,@XAR5 >CLRC OVM >MPY P,@PH,#10922 ; following sections ,I'm puzzled >ADDL ACC,@P >QMPYL ACC,XT,@ACC >SUBL ACC,@XAR5 >QMPYL ACC,XT,@ACC >ADDL ACC,@XAR4 >ASR64 ACC:P,6 ;Q value changed from Q30 to Q24 >LRETR > > >






