Reply by Tim Wescott July 2, 20052005-07-02
john wrote:

> > Tim Wescott wrote: > >>Jerry Avins wrote: >> >> >>>Michael Schoeberl wrote: >>> >>> >>>> >> [NORM function] >>>> >>>> >>>>>What makes a bit "redundant"? >>>> >>>> >>>> >>>>okay - this need more details ... >>>> >>>>the DSP has a one cycle instruction (called NORM) that counts >>>>the bits - starting from the left to the first change. >>>> >>>>I pass a 32 bit integer and get get back the number of >>>>leading zeros ... (works for ned number in 2 complement as well so TI >>>>calls the bits from the left 'redundant') >>>> >>>> >>>>the output of that instruction is similar to >>>>31 - floor(log2(x)) >>> >>> >>>That's an interesting instruction. What does it do with negative numbers? >>> >>> >>>>>What mantissa? Didn't you write "fixed point"? I see the line >>>>>"exp = floor(log2(x)) - 6;" below. How do you take a fixed-point log? >>>> >>>> >>>> >>>>with the help of this special assembly instruction ... >>> >>> >>>Thanks. I understand it all but the mantissa part. What is a fixed-point >>>mantissa? >>> >> >>If TI uses the same nomenclature as Analog Devices, the mantissa is >>what's left after you shift the number up by the bits counted by the >>NORM instruction. Analog Devices has designed their normalization >>instruction (which may or may not be NORM; I can't remember) so that it >>can be vectorized -- this lets you do block floating point calculations >>which makes oodles of sense in a vector processing environment. >> >>-- >>------------------------------------------- >>Tim Wescott >>Wescott Design Services >>http://www.wescottdesign.com > > > Yes, and I wish TI had done the same. Just to explain, on an ADI > processor one can write a one cycle per input loop that automatically > keeps track of minimum shift value for all the inputs. On a '54XX, one > has to find the minimum manually. > > John >
Ew ick. The Analog Devices 21xx DSPs are much better processors for making screaming-fast algorithms for a wide array of problems. Unfortunately they absolutely suck if you want an RTOS -- in fact IIRC they have some hardware stacks who's state you can't query and restore, so they're more or less RTOS proof. The TI 2812 DSP is just as good as the ADSP-21xx for "ordinary" DSP stuff like MACs, but can't go as far as the ADI parts. But saving the entire processor state is direct and clear, so fitting an RTOS onto the thing is easy as pie. I have no idea how the Blackfin compares, or other TI fixed-bitters, or anyone's floating point ones... -- ------------------------------------------- Tim Wescott Wescott Design Services http://www.wescottdesign.com
Reply by john July 2, 20052005-07-02

Tim Wescott wrote:
> Jerry Avins wrote: > > > Michael Schoeberl wrote: > > > >> >> [NORM function] > >> > >>> What makes a bit "redundant"? > >> > >> > >> > >> okay - this need more details ... > >> > >> the DSP has a one cycle instruction (called NORM) that counts > >> the bits - starting from the left to the first change. > >> > >> I pass a 32 bit integer and get get back the number of > >> leading zeros ... (works for ned number in 2 complement as well so TI > >> calls the bits from the left 'redundant') > >> > >> > >> the output of that instruction is similar to > >> 31 - floor(log2(x)) > > > > > > That's an interesting instruction. What does it do with negative numbers? > > > >>> What mantissa? Didn't you write "fixed point"? I see the line > >>> "exp = floor(log2(x)) - 6;" below. How do you take a fixed-point log? > >> > >> > >> > >> with the help of this special assembly instruction ... > > > > > > Thanks. I understand it all but the mantissa part. What is a fixed-point > > mantissa? > > > If TI uses the same nomenclature as Analog Devices, the mantissa is > what's left after you shift the number up by the bits counted by the > NORM instruction. Analog Devices has designed their normalization > instruction (which may or may not be NORM; I can't remember) so that it > can be vectorized -- this lets you do block floating point calculations > which makes oodles of sense in a vector processing environment. > > -- > ------------------------------------------- > Tim Wescott > Wescott Design Services > http://www.wescottdesign.com
Yes, and I wish TI had done the same. Just to explain, on an ADI processor one can write a one cycle per input loop that automatically keeps track of minimum shift value for all the inputs. On a '54XX, one has to find the minimum manually. John
Reply by Michael Schoeberl July 1, 20052005-07-01
> With 6 it's good to 15 bits -- it takes about 5 minutes to code it up > and try it with Matlab, MathCad, Octave, etc.
thanks - unfortunately this would take me much longer (I'm just an engineer without numerical experience) and here on the east coast it's time to go home now! ;-) more on this on Tuesday - have a nice weekend! bye, Michael
Reply by Tim Wescott July 1, 20052005-07-01
Michael Schoeberl wrote:

>> Even faster on a DSP is to use a 3 to 6 element Taylor's series. I'd >> take the "mantissa", scaled to be between 1/4 <= x_m < 1. Then I'd >> execute >> >> y = a_0 + a_1 * x_m + a_2 * x_m^2 ... > > > > ... multivariate least-squares fit ... > > can anyone tell me how accurate this can get with 3 or 6 coefficients? > > > bye, > Michael
With 6 it's good to 15 bits -- it takes about 5 minutes to code it up and try it with Matlab, MathCad, Octave, etc. -- ------------------------------------------- Tim Wescott Wescott Design Services http://www.wescottdesign.com
Reply by Michael Schoeberl July 1, 20052005-07-01
> I understand now. It's very clever. I would explain it a bit > differently, but that's just a mater of viewpoint.
thanks - could you give me a hint how you would explain this? (just the basic idea) I came across this by playing around and I still have no idea why I shift the sqrt(2) and why I need a constant sqrt(sqrt(2)) to correct this in the end!
> Would you like to write it up as a "Trick" for dspguru.com? > http://www.dspguru.com/comp.dsp/tricks/call.htm
ok - I'll do this next week ... bye, Michael
Reply by Michael Schoeberl July 1, 20052005-07-01
> Even faster on a DSP is to use a 3 to 6 element Taylor's series. I'd > take the "mantissa", scaled to be between 1/4 <= x_m < 1. Then I'd execute > > y = a_0 + a_1 * x_m + a_2 * x_m^2 ...
> ... multivariate least-squares fit ... can anyone tell me how accurate this can get with 3 or 6 coefficients? bye, Michael
Reply by Michael Schoeberl July 1, 20052005-07-01
>> If TI uses the same nomenclature as Analog Devices, the mantissa is >> what's left after you shift the number up by the bits counted by the >> NORM instruction.
uuuh - I shouldn't have used this word. I do not calculate the mantissa explicitly - I just take the "exponent" as an output from NORM and start shifting my input.
> To code an accurate square root, compute 1/sqrt(x) iteratively; that can > be done with shifts and multiplies. When done, multiply by x.
this takes quite some instructions! I tried this as well but I needed at least 2 iterations for Newton raphson and did not find a really good starting approximation. I'm down to 16 cycles which can get pipelines quite well.
> For sqrt(x^2+y^2), there's a quicker way than summing the squares and > rooting.
I found this here as well - they take max(x, y) + 0.5*min(x,y) but I needed to calculate something else bye, Michael
Reply by Jerry Avins July 1, 20052005-07-01
Michael Schoeberl wrote:
>>> the DSP has a one cycle instruction (called NORM) that counts >>> the bits - starting from the left to the first change. >>> >>> I pass a 32 bit integer and get get back the number of >>> leading zeros ... (works for ned number in 2 complement as well so TI >>> calls the bits from the left 'redundant') >>> >>> the output of that instruction is similar to >>> 31 - floor(log2(x)) > > >> That's an interesting instruction. What does it do with negative numbers? > > > It also counts - starting from the left to the first changing bit. > > As a negative number (in 2's complement) would be filled up with ones it > would then just count the ones and give you something like > 31 - floor(log2(-x)) > > (not exactly - there is a +1 or -1 missing - from the 2's complement > conversion but I can't remember the details) > > > > >>>> next idea: halven the exponent my shifting ... x >> exp/2 > >>>> (certainly the mantissa is wrong now) > >> >> Thanks. I understand it all but the mantissa part. What is a >> fixed-point mantissa? > > > I take my input and determine the log2 of it. I called this value the > "exponent" (as it *would* be similar to the exponent if I converted my > number to a floting representation - if I *would* convert it which I don't) > > > I just shift my input x by exponent/2 to the right. > > > I do nothing more - but for understanding it I can *think* of it as a > floating point with half the exponent now. This floating point value > would be close to my desired sqrt - except for the mantissa beeing wrong. > > What I got is still just a 32 bit value which is somehow near the value > of the desired sqrt(x) - but wrong by the amount of a wrong mantissa my > float *would* have ... > > > > > sorry for being that unclear - I've been thinking about this for some > days now and it's actually working on the DSP as well (so I'm quite > familiar with my thoughts by now ;-)
Thanks. I understand now. It's very clever. I would explain it a bit differently, but that's just a mater of viewpoint. The code would be the same. Would you like to write it up as a "Trick" for dspguru.com? http://www.dspguru.com/comp.dsp/tricks/call.htm Jerry -- Engineering is the art of making what you want from things you can get. &#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;
Reply by Tim Wescott July 1, 20052005-07-01
Jerry Avins wrote:

> Tim Wescott wrote: > >> Jerry Avins wrote: >> >>> Michael Schoeberl wrote: >>> >>>> >> [NORM function] >>>> >>>>> What makes a bit "redundant"? >>>> >>>> >>>> >>>> >>>> >>>> okay - this need more details ... >>>> >>>> the DSP has a one cycle instruction (called NORM) that counts >>>> the bits - starting from the left to the first change. >>>> >>>> I pass a 32 bit integer and get get back the number of >>>> leading zeros ... (works for ned number in 2 complement as well so >>>> TI calls the bits from the left 'redundant') >>>> >>>> >>>> the output of that instruction is similar to >>>> 31 - floor(log2(x)) >>> >>> >>> >>> >>> That's an interesting instruction. What does it do with negative >>> numbers? >>> >>>>> What mantissa? Didn't you write "fixed point"? I see the line >>>>> "exp = floor(log2(x)) - 6;" below. How do you take a fixed-point log? >>>> >>>> >>>> >>>> >>>> >>>> with the help of this special assembly instruction ... >>> >>> >>> >>> >>> Thanks. I understand it all but the mantissa part. What is a >>> fixed-point mantissa? >>> >> If TI uses the same nomenclature as Analog Devices, the mantissa is >> what's left after you shift the number up by the bits counted by the >> NORM instruction. Analog Devices has designed their normalization >> instruction (which may or may not be NORM; I can't remember) so that >> it can be vectorized -- this lets you do block floating point >> calculations which makes oodles of sense in a vector processing >> environment. > > > Thanks. Now I can take a closer look at the code. > > To code an accurate square root, compute 1/sqrt(x) iteratively; that can > be done with shifts and multiplies. When done, multiply by x. For > sqrt(x^2+y^2), there's a quicker way than summing the squares and > rooting. I first read it in Jon Bentley's Programming Pearls column and > it was discussed here a couple of years ago. Again go for the reciprocal > on machines with fast multiply and slow divide. > > Jerry
Even faster on a DSP is to use a 3 to 6 element Taylor's series. I'd take the "mantissa", scaled to be between 1/4 <= x_m < 1. Then I'd execute y = a_0 + a_1 * x_m + a_2 * x_m^2 ... On the ADI 21xx processors you can do this as fast as a MAC, on a TI processor it takes a few more cycles but it can still be vectorized and it goes by quite quickly. If you do the normalization first then a denormalization you probably only need three or four terms -- but these are DSP's and therefore counterintuitive -- it may be quicker to use 6 or 7 or 8 terms and not mess around with shifts and such. Note that you can truncate the series one or two elements early if you don't use an exact Taylor's expansion. Rather, you should find your coefficients by doing a multivariate least-squares fit on the vector 1, x, x^2, etc. Note also that I can take absolutely no credit for this basic idea: it's in the documentation for the ADI 210x -- page 57 of "Digital Signal Processing Applications using the ADSP-2100 Family" lists this basic algorithm for square root approximations. They use the same basic algorithm for cosine, arctan, etc. -- ------------------------------------------- Tim Wescott Wescott Design Services http://www.wescottdesign.com
Reply by Michael Schoeberl July 1, 20052005-07-01
>> the DSP has a one cycle instruction (called NORM) that counts >> the bits - starting from the left to the first change. >> >> I pass a 32 bit integer and get get back the number of >> leading zeros ... (works for ned number in 2 complement as well so TI >> calls the bits from the left 'redundant') >> >> the output of that instruction is similar to >> 31 - floor(log2(x))
> That's an interesting instruction. What does it do with negative numbers?
It also counts - starting from the left to the first changing bit. As a negative number (in 2's complement) would be filled up with ones it would then just count the ones and give you something like 31 - floor(log2(-x)) (not exactly - there is a +1 or -1 missing - from the 2's complement conversion but I can't remember the details) >>>> next idea: halven the exponent my shifting ... x >> exp/2 >>>> (certainly the mantissa is wrong now)
> > Thanks. I understand it all but the mantissa part. What is a fixed-point > mantissa?
I take my input and determine the log2 of it. I called this value the "exponent" (as it *would* be similar to the exponent if I converted my number to a floting representation - if I *would* convert it which I don't) I just shift my input x by exponent/2 to the right. I do nothing more - but for understanding it I can *think* of it as a floating point with half the exponent now. This floating point value would be close to my desired sqrt - except for the mantissa beeing wrong. What I got is still just a 32 bit value which is somehow near the value of the desired sqrt(x) - but wrong by the amount of a wrong mantissa my float *would* have ... sorry for being that unclear - I've been thinking about this for some days now and it's actually working on the DSP as well (so I'm quite familiar with my thoughts by now ;-) bye, Michael