comp.dsp | sqrt on C6414 DSP

Hi ...

I was working on the C6414 DSP (fixed point) and needed an sqrt 
approximation - I finally found a really nice solution (but I have 
trouble understanding it :-)

my assumptions:
- fixed point representation
- no division ... too slow for me
- approximation is fine ... sqrt(a^2 + b^2) ~ max(a,b) + .5min(a,b)
was not possible in this case ...



my solution:
The processor has an asm instruction called NORM, it gives me the number
of "redundant bits" of the argument ... the result can be expressed as
something like _norm(x) = 31 - floor(log2(x))
... this gives me the exponent of my input x



next idea: halven the exponent my shifting ... x >> exp/2
(certainly the mantissa is wrong now)

somehow I tried to shift 1 << exp/2 and finally something like
sqrt(2) << (exp+1)/2
(that also is wrong by the same amount, different sign!)


the average of both values turned out to be off by a constant factor of
sqrt(sqrt(2)) for even exponent (and sqrt(sqrt(1/2)) for odd exp)
(so this is just one multiplication by a constant factor)


the final result looks like a piecewise linear approximation with an
relative error of 2-3% at most with just 10 asm instructions.




the question: I found most of the constants like sqrt(2) and
sqrt(sqrt(2)) by playing around and the idea of "hey - a little above
1.4 looks like sqrt(2)!" ... no systematics involved


I'm not familiar with discrete arithmetics - how can you analyze this? 
why is it working? is this new? couldn't find anything similar anywhere!



bye,
Michael


here is some matlab code displaying the intermediate steps.

clear all; close all;

mval = 2.^8; % fixed point shift value
num = 512; % number of inputs
x = (num:-1:0); % input values


figure; hold on;
opt = sqrt(x./mval).*mval; plot(opt./mval, 'g:'); % optimum result
ropt = floor(opt+.5); plot(ropt./mval, 'b:'); % rounded optimum


exp = floor(log2(x)) - 6;  %%   exp = 31 - norm(x)
outp = floor(x .* (2 .^ floor(-exp/2)));   %% x = x >> exp/2
outp2 = floor(91 .* (2 .^ floor(exp/2)));   %% x = sqrt(2) << exp/2


plot(outp./mval.*2, 'r-');
plot(outp2./mval.*2, 'r:');

outp3 = outp + outp2; % average both results, 1/2 is hidden above
plot(outp3./mval, 'b-');


% correction of even / odd pixels - mult by constant
even = floor(exp/2).*2 ~= exp;
outp3(find(even))   = outp3(find(even))   .* 303 ./mval;  %% 
floor(sqrt(sqrt(2))   .* mval)
outp3(find(1-even)) = outp3(find(1-even)) .* 214 ./mval;  %% 
floor(sqrt(sqrt(1/2)) .* mval)
outp3 = floor(outp3);


% abs error display
figure; hold on;
plot( (outp3-opt)./mval, 'r:');
plot((ropt-opt)./mval, 'b');


% rel error display
figure; hold on;
plot( (outp3-opt)./opt, 'r:');
plot((ropt-opt)./opt, 'b');

Reply by Jerry Avins ●July 1, 20052005-07-01

Michael Schoeberl wrote:
> Hi ...
> 
> I was working on the C6414 DSP (fixed point) and needed an sqrt 
> approximation - I finally found a really nice solution (but I have 
> trouble understanding it :-)
> 
> my assumptions:
> - fixed point representation
> - no division ... too slow for me
> - approximation is fine ... sqrt(a^2 + b^2) ~ max(a,b) + .5min(a,b)
> was not possible in this case ...
> 
> 
> 
> my solution:
> The processor has an asm instruction called NORM, it gives me the number
> of "redundant bits" of the argument ... the result can be expressed as
> something like _norm(x) = 31 - floor(log2(x))
> ... this gives me the exponent of my input x

What makes a bit "redundant"?

> next idea: halven the exponent my shifting ... x >> exp/2
> (certainly the mantissa is wrong now)

What mantissa? Didn't you write "fixed point"?   I see the line
"exp = floor(log2(x)) - 6;" below. How do you take a fixed-point log?

   ...

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by Michael Schoeberl ●July 1, 20052005-07-01

 >> [NORM function]
> What makes a bit "redundant"?

okay - this need more details ...

the DSP has a one cycle instruction (called NORM) that counts
the bits - starting from the left to the first change.

I pass a 32 bit integer and get get back the number of
leading zeros ... (works for ned number in 2 complement as well so TI 
calls the bits from the left 'redundant')


the output of that instruction is similar to
31 - floor(log2(x))


> What mantissa? Didn't you write "fixed point"?   I see the line
> "exp = floor(log2(x)) - 6;" below. How do you take a fixed-point log?

with the help of this special assembly instruction ...



bye,
Michael

Reply by Michael Schoeberl ●July 1, 20052005-07-01

 > on the C6414 DSP (fixed point) ... sqrt

I got a C6416 and the instruction set reference says the NORM 
instruction is available on all C6000 DSPs from TI.


oh man - it's time for the long weekend!


bye,
Michael

Reply by Jerry Avins ●July 1, 20052005-07-01

Michael Schoeberl wrote:
>  >> [NORM function]
> 
>> What makes a bit "redundant"?
> 
> 
> okay - this need more details ...
> 
> the DSP has a one cycle instruction (called NORM) that counts
> the bits - starting from the left to the first change.
> 
> I pass a 32 bit integer and get get back the number of
> leading zeros ... (works for ned number in 2 complement as well so TI 
> calls the bits from the left 'redundant')
> 
> 
> the output of that instruction is similar to
> 31 - floor(log2(x))

That's an interesting instruction. What does it do with negative numbers?

>> What mantissa? Didn't you write "fixed point"?   I see the line
>> "exp = floor(log2(x)) - 6;" below. How do you take a fixed-point log?
> 
> 
> with the help of this special assembly instruction ...

Thanks. I understand it all but the mantissa part. What is a fixed-point 
mantissa?

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by Tim Wescott ●July 1, 20052005-07-01

Jerry Avins wrote:

> Michael Schoeberl wrote:
> 
>>  >> [NORM function]
>>
>>> What makes a bit "redundant"?
>>
>>
>>
>> okay - this need more details ...
>>
>> the DSP has a one cycle instruction (called NORM) that counts
>> the bits - starting from the left to the first change.
>>
>> I pass a 32 bit integer and get get back the number of
>> leading zeros ... (works for ned number in 2 complement as well so TI 
>> calls the bits from the left 'redundant')
>>
>>
>> the output of that instruction is similar to
>> 31 - floor(log2(x))
> 
> 
> That's an interesting instruction. What does it do with negative numbers?
> 
>>> What mantissa? Didn't you write "fixed point"?   I see the line
>>> "exp = floor(log2(x)) - 6;" below. How do you take a fixed-point log?
>>
>>
>>
>> with the help of this special assembly instruction ...
> 
> 
> Thanks. I understand it all but the mantissa part. What is a fixed-point 
> mantissa?
> 
If TI uses the same nomenclature as Analog Devices, the mantissa is 
what's left after you shift the number up by the bits counted by the 
NORM instruction.  Analog Devices has designed their normalization 
instruction (which may or may not be NORM; I can't remember) so that it 
can be vectorized -- this lets you do block floating point calculations 
which makes oodles of sense in a vector processing environment.

-- 
-------------------------------------------
Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by Jerry Avins ●July 1, 20052005-07-01

Tim Wescott wrote:
> Jerry Avins wrote:
> 
>> Michael Schoeberl wrote:
>>
>>>  >> [NORM function]
>>>
>>>> What makes a bit "redundant"?
>>>
>>>
>>>
>>>
>>> okay - this need more details ...
>>>
>>> the DSP has a one cycle instruction (called NORM) that counts
>>> the bits - starting from the left to the first change.
>>>
>>> I pass a 32 bit integer and get get back the number of
>>> leading zeros ... (works for ned number in 2 complement as well so TI 
>>> calls the bits from the left 'redundant')
>>>
>>>
>>> the output of that instruction is similar to
>>> 31 - floor(log2(x))
>>
>>
>>
>> That's an interesting instruction. What does it do with negative numbers?
>>
>>>> What mantissa? Didn't you write "fixed point"?   I see the line
>>>> "exp = floor(log2(x)) - 6;" below. How do you take a fixed-point log?
>>>
>>>
>>>
>>>
>>> with the help of this special assembly instruction ...
>>
>>
>>
>> Thanks. I understand it all but the mantissa part. What is a 
>> fixed-point mantissa?
>>
> If TI uses the same nomenclature as Analog Devices, the mantissa is 
> what's left after you shift the number up by the bits counted by the 
> NORM instruction.  Analog Devices has designed their normalization 
> instruction (which may or may not be NORM; I can't remember) so that it 
> can be vectorized -- this lets you do block floating point calculations 
> which makes oodles of sense in a vector processing environment.

Thanks. Now I can take a closer look at the code.

To code an accurate square root, compute 1/sqrt(x) iteratively; that can 
be done with shifts and multiplies. When done, multiply by x. For 
sqrt(x^2+y^2), there's a quicker way than summing the squares and 
rooting. I first read it in Jon Bentley's Programming Pearls column and 
it was discussed here a couple of years ago. Again go for the reciprocal 
on machines with fast multiply and slow divide.

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Reply by Tim Wescott ●July 1, 20052005-07-01

Jerry Avins wrote:

> Tim Wescott wrote:
> 
>> Jerry Avins wrote:
>>
>>> Michael Schoeberl wrote:
>>>
>>>>  >> [NORM function]
>>>>
>>>>> What makes a bit "redundant"?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> okay - this need more details ...
>>>>
>>>> the DSP has a one cycle instruction (called NORM) that counts
>>>> the bits - starting from the left to the first change.
>>>>
>>>> I pass a 32 bit integer and get get back the number of
>>>> leading zeros ... (works for ned number in 2 complement as well so 
>>>> TI calls the bits from the left 'redundant')
>>>>
>>>>
>>>> the output of that instruction is similar to
>>>> 31 - floor(log2(x))
>>>
>>>
>>>
>>>
>>> That's an interesting instruction. What does it do with negative 
>>> numbers?
>>>
>>>>> What mantissa? Didn't you write "fixed point"?   I see the line
>>>>> "exp = floor(log2(x)) - 6;" below. How do you take a fixed-point log?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> with the help of this special assembly instruction ...
>>>
>>>
>>>
>>>
>>> Thanks. I understand it all but the mantissa part. What is a 
>>> fixed-point mantissa?
>>>
>> If TI uses the same nomenclature as Analog Devices, the mantissa is 
>> what's left after you shift the number up by the bits counted by the 
>> NORM instruction.  Analog Devices has designed their normalization 
>> instruction (which may or may not be NORM; I can't remember) so that 
>> it can be vectorized -- this lets you do block floating point 
>> calculations which makes oodles of sense in a vector processing 
>> environment.
> 
> 
> Thanks. Now I can take a closer look at the code.
> 
> To code an accurate square root, compute 1/sqrt(x) iteratively; that can 
> be done with shifts and multiplies. When done, multiply by x. For 
> sqrt(x^2+y^2), there's a quicker way than summing the squares and 
> rooting. I first read it in Jon Bentley's Programming Pearls column and 
> it was discussed here a couple of years ago. Again go for the reciprocal 
> on machines with fast multiply and slow divide.
> 
> Jerry
Even faster on a DSP is to use a 3 to 6 element Taylor's series.  I'd 
take the "mantissa", scaled to be between 1/4 <= x_m < 1.  Then I'd execute

y = a_0 + a_1 * x_m + a_2 * x_m^2 ...

On the ADI 21xx processors you can do this as fast as a MAC, on a TI 
processor it takes a few more cycles but it can still be vectorized and 
it goes by quite quickly.  If you do the normalization first then a 
denormalization you probably only need three or four terms -- but these 
are DSP's and therefore counterintuitive -- it may be quicker to use 6 
or 7 or 8 terms and not mess around with shifts and such.

Note that you can truncate the series one or two elements early if you 
don't use an exact Taylor's expansion.  Rather, you should find your 
coefficients by doing a multivariate least-squares fit on the vector 1, 
x, x^2, etc.

Note also that I can take absolutely no credit for this basic idea: 
it's in the documentation for the ADI 210x -- page 57 of "Digital Signal 
Processing Applications using the ADSP-2100 Family" lists this basic 
algorithm for square root approximations.  They use the same basic 
algorithm for cosine, arctan, etc.

-- 
-------------------------------------------
Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Reply by Michael Schoeberl ●July 1, 20052005-07-01

>> the DSP has a one cycle instruction (called NORM) that counts
>> the bits - starting from the left to the first change.
>>
>> I pass a 32 bit integer and get get back the number of
>> leading zeros ... (works for ned number in 2 complement as well so TI 
>> calls the bits from the left 'redundant')
>>
>> the output of that instruction is similar to
>> 31 - floor(log2(x))

> That's an interesting instruction. What does it do with negative numbers?

It also counts - starting from the left to the first changing bit.

As a negative number (in 2's complement) would be filled up with ones it 
would then just count the ones and give you something like
31 - floor(log2(-x))

(not exactly - there is a +1 or -1 missing - from the 2's complement 
conversion but I can't remember the details)



 >>>> next idea: halven the exponent my shifting ... x >> exp/2
 >>>> (certainly the mantissa is wrong now)
> 
> Thanks. I understand it all but the mantissa part. What is a fixed-point 
> mantissa?

I take my input and determine the log2 of it. I called this value the 
"exponent" (as it *would* be similar to the exponent if I converted my 
number to a floting representation - if I *would* convert it which I don't)


I just shift my input x by exponent/2 to the right.


I do nothing more - but for understanding it I can *think* of it as a 
floating point with half the exponent now. This floating point value 
would be close to my desired sqrt - except for the mantissa beeing wrong.

What I got is still just a 32 bit value which is somehow near the value 
of the desired sqrt(x) - but wrong by the amount of a wrong mantissa my 
float *would* have ...




sorry for being that unclear - I've been thinking about this for some 
days now and it's actually working on the DSP as well (so I'm quite 
familiar with my thoughts by now ;-)



bye,
Michael

Reply by Jerry Avins ●July 1, 20052005-07-01

Michael Schoeberl wrote:
>>> the DSP has a one cycle instruction (called NORM) that counts
>>> the bits - starting from the left to the first change.
>>>
>>> I pass a 32 bit integer and get get back the number of
>>> leading zeros ... (works for ned number in 2 complement as well so TI 
>>> calls the bits from the left 'redundant')
>>>
>>> the output of that instruction is similar to
>>> 31 - floor(log2(x))
> 
> 
>> That's an interesting instruction. What does it do with negative numbers?
> 
> 
> It also counts - starting from the left to the first changing bit.
> 
> As a negative number (in 2's complement) would be filled up with ones it 
> would then just count the ones and give you something like
> 31 - floor(log2(-x))
> 
> (not exactly - there is a +1 or -1 missing - from the 2's complement 
> conversion but I can't remember the details)
> 
> 
> 
>  >>>> next idea: halven the exponent my shifting ... x >> exp/2
>  >>>> (certainly the mantissa is wrong now)
> 
>>
>> Thanks. I understand it all but the mantissa part. What is a 
>> fixed-point mantissa?
> 
> 
> I take my input and determine the log2 of it. I called this value the 
> "exponent" (as it *would* be similar to the exponent if I converted my 
> number to a floting representation - if I *would* convert it which I don't)
> 
> 
> I just shift my input x by exponent/2 to the right.
> 
> 
> I do nothing more - but for understanding it I can *think* of it as a 
> floating point with half the exponent now. This floating point value 
> would be close to my desired sqrt - except for the mantissa beeing wrong.
> 
> What I got is still just a 32 bit value which is somehow near the value 
> of the desired sqrt(x) - but wrong by the amount of a wrong mantissa my 
> float *would* have ...
> 
> 
> 
> 
> sorry for being that unclear - I've been thinking about this for some 
> days now and it's actually working on the DSP as well (so I'm quite 
> familiar with my thoughts by now ;-)

Thanks. I understand now. It's very clever. I would explain it a bit 
differently, but that's just a mater of viewpoint. The code would be the 
same. Would you like to write it up as a "Trick" for dspguru.com? 
http://www.dspguru.com/comp.dsp/tricks/call.htm

Jerry
-- 
Engineering is the art of making what you want from things you can get.
&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;&#4294967295;

Previous12 Next

sqrt on C6414 DSP

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group