DSPRelated.com
Forums

IQ mag: Integer approximation of square root of the sum of two squares

Started by Robmar 1 year ago23 replieslatest reply 1 year ago305 views

Anyone knows the name of this algorithm and a fast and also integer alternative?

Int16 Mag(int16 i, int16 q)

(

If abs(i) > abs(q)

  Return abs(i) + abs(q) / 4;

Else

  Return abs(q) + abs(i) / 4;

)

[ - ]
Reply by DanBoschenNovember 17, 2022

This is the "Alpha Max + Beta Min" algorithm as a high-speed approximation for the magnitude of a complex number (or more generically the square root of the sum of two squares. Please see https://en.wikipedia.org/wiki/Alpha_max_plus_beta_... 

[ - ]
Reply by Rick LyonsNovember 17, 2022

Hi Dan. I learned Robmar's algorithm back in the 1980s and was told it was called "Max plus beta min." Back then I thought, with an additional multiply perhaps we could improve that algorithm's performance. So I started experimenting with multiplying the max value by an alpha value where 0 ≤ alpha ≤ 1. I decided to call the modified algorithm "Alpha max plus beta min."

[ - ]
Reply by RobmarNovember 17, 2022

Interesting, even more so if you would share your work :)  Thanks!

[ - ]
Reply by Rick LyonsNovember 17, 2022

Hello Robmar. My ol' DSP pal Grant Griffin posted material on the Alpha Max Plus Beta Min algorithm at: 

https://dspguru.com/dsp/tricks/magnitude-estimator...

[ - ]
Reply by DanBoschenNovember 17, 2022

Very nice Rick! I didn't know that, thanks for the additional history and nice work. Until I reread the post, I was going to go forward from this point thinking that it was originally "Robmar's Algorithm" up until the day you improved it. But now I get that you meant the algorithm that the poster Robmar here described.

[ - ]
Reply by RobmarNovember 17, 2022

Thanks for that, very useful!!

[ - ]
Reply by CedronNovember 17, 2022
[ - ]
Reply by DanBoschenNovember 17, 2022

Related but those don’t really fit the bill as they are used to compare two complex magnitudes rather than estimate the complex magnitude. However what I didn’t mention but is also an efficient estimator is the CORDIC Rotator. (Use the CORDIC to rotate the angle to zero with the result on I as the magnitude times the CORDIC gain

[ - ]
Reply by CedronNovember 17, 2022
IIRC, some of the methods did this by computing the two values and comparing them.  Anyway, the post covered the OP's equation to some detail as well as many other issues having to do with an integer implementation.

Close enough to be very useful, IMO.  The big challenge there was avoiding any multiplications, hence a lot of shifts in the solutions.

Here is how I think I would do this one:

=====================
prep()

    for M = 0 to 2^5
      A = atan( M / (2^5) )
      
      XF[M] = 2^7 * cos( A )
      YF[M] = 2^7 * sin( A )
    next 

=====================
mag( I, Q )

    AI = abs( I )
    AQ = abs( Q )
    
    if AI + AQ = 0 return 0

    if AI < AQ then
       X = AQ : Y = AI
    else
       X = AI : Y = AQ
       
    M = ( Y << 5 ) / X

    D = ( X * XF[M] + Y * YF[M] ) >> 7

    return D
    
=====================


Here the '5' represents a 2^5 + 1 = 33 place lookup table.

The '7' represents a fixed point fraction offset of 7 in the table values.

Why it works.

---------------
X = R cos( A )
Y = R sin( A )

X * XF + Y * YF = R cos( A ) * cos( A ) + R sin( A ) * sin( A )
                = R ( cos^2( A ) + sin^2( A ) )
                = R
---------------


You can also think of it as a local plane approximation to sqrt( x^2 + y^2 ), or a rotation of the point to the real axis.

The OP didn't specify any tolerance, so I have no recommendation for the table size.
[ - ]
Reply by CharlieRaderNovember 17, 2022

I have known about this for decades, but never saw it called an integer approximation. In the early days of my career, a square root operation was a big expense, but I noticed that most of the time, one ended up comparing the computed square root of the sum of squares to some threshold, and it was easier to compare the sum of the squares to the square of the threshold.

[ - ]
Reply by philipoakleyNovember 17, 2022

It is also worth reflecting on how this affects investigations of signals in noise and the apparent sharpness of the transition. 

In a broad band signal you can add 50% noise for only a 10% increase in the total power, and like wise if the signal is a full 50% of the noise, you won't see it's 10% overall contribution.

It you are ever testing for susceptibility you should try 'synchronising' the noise injection to a visible alias frequency of the equipment operating frequency. Unfortunately its not mentioned in any of the standards. Essentially you look to make the injection noise look like 'Hum Bars' on the signal (Hum bars were a feature of US TV sets where the AC supply frequency (and it's harmonic) was a fraction different (apparently by design) from the TV line frequency. Works wherever you have a synchronised 'image' display - because everyone can see the obvious moving injceted noise!

[ - ]
Reply by RobmarNovember 17, 2022

I haven't got my head around the varied use of noise injection yet.  Was reading on that in the technique to increases ADC resolution if there was enough noise on the sampled signal, or adding noise, so that a 1 or more but increase could be achieved.  I mean if the signal is so stable that you need noise to flip the LS bit, what's the gain?!  Thanks for your reply

[ - ]
Reply by philipoakleyNovember 17, 2022

Years ago (on a Sinclair QL IIRC) I did calcs regarding adding rms noise equal to 1/2 lsb and found that with sufficient averaging, you got at least 8 extra bits of linearity. You need 2^(2N) samples to get N bits of extra linearity.


I suspect it's also a part of the reason we get DAC / ADC non-linearity limits of a 2-lsb step, so 1 lsb rms (Gaussian) would hide it ;-).


The spike noise injection (e.g. close capacitive coupling) is useful with systems that have sampled ADCs such as cameras and imagers so the aperture time for a single noise spike is low, but if you make it synchronised to a scrolling pattern, the eye will pick it out really easy. Had that happen many years ago with a perfect 600Hz drive signal interfering with an NTSC timed imager system and generating beautiful scrolling bands, even though all the RFI tests passed.

[ - ]
Reply by timbly5000November 17, 2022

There are variations on the integer IQ magnitude approximation (still with compute-friendly coefficient values) that give a lower error that that posted above and also with an error that is better centred about zero. If I recall correctly:

mag ~= 7/8 abs(largest) + 1/2 abs(smallest)

and better still mag ~= 15/16 abs(largest) + 1/2 abs(smallest)

(7/8 and 15/16 can be computed using a shift and a subtract)


On a side note, if you need log magnitude and can afford the time needed for a log10 call then using 10 * log10 (sum of squares) gives you the square root for free.

Tim.

[ - ]
Reply by Robert WolfeNovember 17, 2022

I learned something today!  Updated my spreadsheet to confirm :)

I Q SOS Z =
sqrt((I*I)+(Q*Q))
20*log10(Z) 10*log10(SOS) “Mag” function % error
20 0.01 400.000 20.000 26.021 26.021 20.0025 0.012
20 1 401.000 20.025 26.031 26.031 20.25 1.124
20 5 425.000 20.616 26.284 26.284 21.25 3.078
Robert
[ - ]
Reply by RobmarNovember 17, 2022

Thanks Tim, that's helpful, I'll give those a try!

[ - ]
Reply by drmikeNovember 17, 2022

Check out this post: 

https://www.dsprelated.com/showthread/comp.dsp/652...


Same math, different reason.

[ - ]
Reply by RobmarNovember 17, 2022

Thanks for the link, its an interesting post, but too many CPU cycles!

[ - ]
Reply by Robert WolfeNovember 17, 2022

I guess with z = sqrt( ( I * I ) + ( Q * Q ) ), as Q becomes <<< I, the contribution from Q becomes less and less, and the result approaches sqrt( I * I ), or abs(I), with some change, which is the abs(Q)/4.  And vice versa.

I did some quick spreadsheetin' as empirical proof:

I Q z Mag fn % error
20 0.01 20.000 20.0025 0.012
20 1 20.025 20.25 1.124
20 5 20.616 21.25 3.078

I think this is already a fast approximation to the full sqrt operation (sqrt being costly in CPU cycles), using integer values.  The big hit in your routine is in the pipeline-clearing logical If-Else.  If you knew for certain that Q was always <<< I, or vice versa, you could get rid of that by just using one of those two Return statements.

[ - ]
Reply by RobmarNovember 17, 2022

Hi, thanks for your reply, that's interesting!  Maybe it is about the best there is.

[ - ]
Reply by CedronNovember 17, 2022
Here is an improved answer over the one in my earlier reply to Dan.

The '7' is actually the number of effective significant bits.  Headroom calculations are application dependent.

Otherwise, without too large a table, you should be able to get very reasonable results.

=====================
prep()

    for M = 0 to 2^5
      A = atan( M / (2^5) )
      
      Cos_M[M] = 2^7 * cos( A )
      Sin_M[M] = 2^7 * sin( A )
    next 

=====================
mag( I, Q )

    AI = abs( I )
    AQ = abs( Q )
   
    if AI < AQ then
       M = ( AI << 5 ) / AQ
       D = ( AQ * Cos_M[M] + AI * Sin_M[M] ) >> 7
    else
       if AI = 0 return 0
       M = ( AQ << 5 ) / AI
       D = ( AI * Cos_M[M] + AQ * Sin_M[M] ) >> 7

    return D
    
=====================
[ - ]
Reply by CedronNovember 17, 2022

This answer is what you might be looking for.

https://dsp.stackexchange.com/questions/62893/#630...

It divides the arc up without the division, and multiplies the coefficients with hard coded shifts and adds.

Since you are "counting cycles", I would say, "test it, and time it" instead.  

[ - ]
Reply by jtp_1960November 17, 2022

In your code, abs() is called six times when you need to do it just twice and also MUL usually is cheaper than DIV so /4 could be changed to *0.25. How much you could gain in execution speed depends on your compiler/-options... . If you do it using C++, current GCC and Clang compilers also does good job with those std library functions (maybe with sqrt() and abs() too (I have not bench-marked these two)).

EDIT: https://www.desmos.com/calculator/vtrllpldhf

Looks like (when compiled using C++/GCC -O3 -ffast-math) library sqrt() as 

int16_t intsqrt(int16_t x, int16_t y){
   return sqrt(x * x + y * y);
}

is just little (~0.3 rdtsc/val) slower than function (with little modifications mentioned in discussion) given in opening post, so, why approximate (of course if you use other programming language or compiler which does not do good optimization...)?

[ - ]
Reply by CedronNovember 17, 2022
That's a good rule for floating point, but those are integers.  A multiply by 0.25 would force several unnecessary conversions.  The  /4 can be replaced by >>2 (if not rounded is okay, otherwise add 3 first), which is likely done by the optimizer.

The abs calls can only occur four times in any particular call.  Again, an optimizer will catch this with temporary variables implemented as registers, but since this is pedantic mode, your point is true. 

Getting hardcore, it used to be that "if x < 0 then x = -x" was faster than "abs()".  You just can't tell anymore with pipelines and predictive branching and stuff.  Like I said in my post, code the alternatives and test them. Compilers have gotten rather sophisticated and tend to beat hand assembly now a days.