Looking for an algorithm to accomplish the following in FPGA/ASIC. Accuracy needs to be very high; errors less than 1 part per million. Speed, of course, is also crucial, but given the implementation context I can pipeline to my heart's content: I have two, two-dimensional vectors of arbitrary length and direction; call them A and B. I need to find vector A' that is the same length as A but in the direction of B. Of course, I could simply compute A' = (|A|/|B|)*B but that requires a square root and a divide. I thought about using some variant of CORDIC to rotate A onto B, but I wonder if there's a better way. Thanks, Greg Berchin
Vector Rotation
Started by ●February 26, 2006
Reply by ●February 27, 20062006-02-27
Greg Berchin wrote:> Looking for an algorithm to accomplish the following in FPGA/ASIC. > Accuracy needs to be very high; errors less than 1 part per million. > Speed, of course, is also crucial, but given the implementation context > I can pipeline to my heart's content: > > I have two, two-dimensional vectors of arbitrary length and direction; > call them A and B. I need to find vector A' that is the same length as > A but in the direction of B. Of course, I could simply compute > A' = (|A|/|B|)*B > but that requires a square root and a divide.How bad is a square root and a divide on your platform? Both square root and division can be done via Newton-Raphson if you can do multiplications (for the division, use N-R to find the inverse, then multiply). Is this not better than any other method you could think of? The only alternative that comes to mind (which does not mean that it is the only alternative, of course) is to do binary search on the magnitude -- or rather, on the scaling factor for vector B required to obtain A' -- the comparison criterion for the binary search would be the magnitude *square*; that way, you only need two multiplications and one addition -- no square roots. Basically, you do a binary search of the scaling factor f that makes f*f*|B|^2 equal |A|^2 -- once you obtain f to the required precision, multiply each of B's coordinates times f. The thing is, would this be better than using Newton-Raphson to obtain the square root and the inverse of |B| ? HTH, Carlos --
Reply by ●February 27, 20062006-02-27
On Sun, 26 Feb 2006 23:07:27 -0500, Carlos Moreno <moreno_at_mochima_dot_com@mailinator.com> wrote:>How bad is a square root and a divide on your platform?Bad enough that they are almost universally avoided.>Both square root and division can be done via Newton-Raphson >if you can do multiplications (for the division, use N-R to >find the inverse, then multiply).A good suggestion. For some reason I seem to overlook Newton-Raphson as one of the "obvious" choices; I guess I've never liked the fact that it can diverge in some cases. Should be no such problems in this situation.>The only alternative that comes to mind (which does not mean >that it is the only alternative, of course) is to do binary >search on the magnitude -- or rather, on the scaling factor >for vector B required to obtain A'Conceptually I like this idea. Even though it requires as many stages as there are bits in the data word, its simplicity is attractive. Many thanks, Greg Berchin
Reply by ●February 27, 20062006-02-27
Greg Berchin wrote:>>Both square root and division can be done via Newton-Raphson >>if you can do multiplications (for the division, use N-R to >>find the inverse, then multiply). > > > A good suggestion. For some reason I seem to overlook Newton-Raphson as > one of the "obvious" choices; I guess I've never liked the fact that it > can diverge in some cases. Should be no such problems in this > situation.I'm pretty certain that for the square root, N-R is guaranteed to work (don't know of any proof, but I wouldn't be surprised if it has been rigorously proven) -- if you start with an initial estimate of x (for the sqrt(x)) if x > 1, and 1 if x < 1. As for the inverse, the function's derivative is well-behaving enough that I'm quite sure that N-R should also converge.>>The only alternative that comes to mind (which does not mean >>that it is the only alternative, of course) is to do binary >>search on the magnitude -- or rather, on the scaling factor >>for vector B required to obtain A' > > Conceptually I like this idea. Even though it requires as many stages > as there are bits in the data word, its simplicity is attractive.After posting, I thought of a similar method based on the same idea -- instead of the dot product, use the vector product (cross product) to compute the area of the triangle formed by the origin and the points A' and B (i.e., the vectors seen as points in the plane), and iterate (doing binary search on the rotation of A until the area of the triangle is zero -- or similarly, until the magnitude of the z-coordinate of the cross product of A' and B is 0). The cross-product is far simpler than the square magnitude times the scaling factor squared -- the z-coordinate is Xa*Yb-Xb*Ya. The only thing is that the process of obtaining and applying the rotation matrices would be more complex; this is attenuated by the good news that the binary search on the right rotation angle does not require extensive computation -- you're halving the angles, so you can obtain cos(a/2) and sin(a/2) in terms of cos(a) and sin(a) (hopefully without involving divisions or square roots?? You'll have to check that). Anyway, I kind of doubt that either one of these two approaches for binary search may be more efficient than N-R for the sqrt and the inverse; but I thought I'd mention this variation, given that you liked the idea. HTH, Carlos --
Reply by ●February 27, 20062006-02-27
Greg Berchin wrote:> Looking for an algorithm to accomplish the following in FPGA/ASIC. > Accuracy needs to be very high; errors less than 1 part per million. > Speed, of course, is also crucial, but given the implementation context > I can pipeline to my heart's content: > > I have two, two-dimensional vectors of arbitrary length and direction; > call them A and B. I need to find vector A' that is the same length as > A but in the direction of B. Of course, I could simply compute > A' = (|A|/|B|)*B > but that requires a square root and a divide. I thought about using > some variant of CORDIC to rotate A onto B, but I wonder if there's a > better way. > > Thanks, > Greg BerchinA method that I originally saw in the ADSP-2101 literature, but should port well to an FPGA, is to just do a polynomial expansion, i.e. y = a_0 + a_1 x + a_2 x^2 + ... + a_n x^n. This reduces to keeping a running result for x^n and doing a MAC for the a_n * x^n; with two multipliers you should be able to do it in not too much more than n+1 clock cycles, and n shouldn't have to be too big (I'm guessing 10 or so). You could save a multiplier by doubling the time, of course, or use 2n multipliers and do it in one computation/clock, if you had to. The cool thing about the way they did it was that they did _not_ try to do a Taylor's series. Instead, they just started with the recognition that the series was going to be truncated, and they found the least-squares fit for the coefficients (this is easy to do on just about any math package -- you can even coerce Excel to do it). This should work well for both square roots and for 1/|B|. In your case I would start by finding a rough root (inverse) by shifting. For the square root find N such that x_r * 2^N = x and 0.25 < x_r <= 1; for the inverse find N such that x_r * 2^N = x and 0.5 < x_r <= 1. Then sqrt(x) = 2^N sqrt(r) and 1/x = 2^-N/x_r. This means that you can restrict the number of x_r^n terms you need to find, while still maintaining accuracy. Depending on your speed/area requirements you can find N in between 1 and N clock cycles. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com Posting from Google? See http://cfaj.freeshell.org/google/
Reply by ●February 27, 20062006-02-27
On Mon, 27 Feb 2006 13:39:50 -0500, Carlos Moreno <moreno_at_mochima_dot_com@mailinator.com> wrote:>I'm pretty certain that for the square root, N-R is guaranteed >to work (don't know of any proof, but I wouldn't be surprised >if it has been rigorously proven)It's pretty intuitive. As long as the function used for N-R has no inflection points near the zero crossing, and the initial estimate is well away from any local minimum or maximum, it will converge. For the square root I would use x� - K = 0, which is well-behaved.>After posting, I thought of a similar method based on the same >idea -- instead of the dot product, use the vector product (cross >product)I also experimented with a similar idea. From what I've seen, though, it looks like CORDIC will be at least as simple. The nice thing about CORDIC is that it only uses shifts and adds, with multiplication by a gain constant applied after all of the other processing is complete. CORDIC can also compute arctan(y/x) directly. So now I'm thinking maybe use the fact that tan(�) = |A x B|/(A � B) to allow CORDIC to rotate the vector. Another point about a binary search, which occurred to me after my last message, is that it is so very similar to a one-bit divide iteration that I might as well just do the division.>Anyway, I kind of doubt that either one of these two approaches >for binary search may be more efficient than N-R for the sqrt >and the inverse;Yeah, it's looking that way. Many thanks again for your input. Greg Berchin
Reply by ●February 27, 20062006-02-27
"Carlos Moreno" <moreno_at_mochima_dot_com@mailinator.com> wrote in message news:lGHMf.11242$rg2.537803@weber.videotron.net...> Greg Berchin wrote: > >>>Both square root and division can be done via Newton-Raphson >>>if you can do multiplications (for the division, use N-R to >>>find the inverse, then multiply). >> >> >> A good suggestion. For some reason I seem to overlook Newton-Raphson as >> one of the "obvious" choices; I guess I've never liked the fact that it >> can diverge in some cases. Should be no such problems in this >> situation. > > I'm pretty certain that for the square root, N-R is guaranteed > to work (don't know of any proof, but I wouldn't be surprised > if it has been rigorously proven) -- if you start with an initial > estimate of x (for the sqrt(x)) if x > 1, and 1 if x < 1. >Hello Guys, Actually the Newton - Raphson method is quadratically convergent in the limit of the point of convergence. However, it can be painfully slow. For example try using this for sqrt(x) with x near zero. Here the derivative causes a real problem. The region of convergence is very small for x small. And it is easy to have a seed fall outside of this region, hence your algo diverges. One alternative is to find 1/sqrt(x), another is using an algo for directly finding the square root by pairing bits and using trial subtractions. I've seen this actually done in hardware for a quick square root. Each iteration yields one bit of the square root. Clay
Reply by ●February 27, 20062006-02-27
Clay S. Turner wrote:>>I'm pretty certain that for the square root, N-R is guaranteed >>to work (don't know of any proof, but I wouldn't be surprised >>if it has been rigorously proven) -- if you start with an initial >>estimate of x (for the sqrt(x)) if x > 1, and 1 if x < 1. > > Actually the Newton - Raphson method is quadratically convergent in the > limit of the point of convergence. However, it can be painfully slow. For > example try using this for sqrt(x) with x near zero. Here the derivative > causes a real problem. The region of convergence is very small for x small. > And it is easy to have a seed fall outside of this regionHuh?? You can not fall outside the region of convergence if you are at a value that is greater than the square root. That's why if you start at x = 1 (to compute x = sqrt(a) where a < 1), there is no way that the algorithm can diverge. In fact, now that I think about it, for *any* value greater than the solution, the algorithm converges, which means that for any starting value > 0, the algorithm converges (floating-point inaccuracy and rounding errors aside), since in the worst case, the first iteration will send the value from a very small number to a very large value, but then, from that large value we have the guarantee that we'll converge. For instance, it takes 20 iterations to obtain the square root of 0.01 with an initial estimate of 10 million. About 13 iterations if we start at 0.0000001, since after the first iteration, the estimate jumps to 50000 -- and then, from there it begins to go monotonically down, approaching 0.1 If you start at 1, it takes 7 iterations to get to 0.1 (with 6 decimal places accuracy) -- I guess that's quite slow, depending on the application. Binary search, I guess? Carlos --
Reply by ●February 28, 20062006-02-28
"Carlos Moreno" <moreno_at_mochima_dot_com@mailinator.com> wrote in message news:5aOMf.25582$Qs1.604698@wagner.videotron.net...> Clay S. Turner wrote: > >>>I'm pretty certain that for the square root, N-R is guaranteed >>>to work (don't know of any proof, but I wouldn't be surprised >>>if it has been rigorously proven) -- if you start with an initial >>>estimate of x (for the sqrt(x)) if x > 1, and 1 if x < 1. >> >> Actually the Newton - Raphson method is quadratically convergent in the >> limit of the point of convergence. However, it can be painfully slow. For >> example try using this for sqrt(x) with x near zero. Here the derivative >> causes a real problem. The region of convergence is very small for x >> small. And it is easy to have a seed fall outside of this region > > Huh?? You can not fall outside the region of convergence if you > are at a value that is greater than the square root. That's why > if you start at x = 1 (to compute x = sqrt(a) where a < 1), there > is no way that the algorithm can diverge. > > In fact, now that I think about it, for *any* value greater than > the solution, the algorithm converges, which means that for any > starting value > 0, the algorithm converges (floating-point > inaccuracy and rounding errors aside), since in the worst case, > the first iteration will send the value from a very small number > to a very large value, but then, from that large value we have > the guarantee that we'll converge. > > For instance, it takes 20 iterations to obtain the square root of > 0.01 with an initial estimate of 10 million. About 13 iterations > if we start at 0.0000001, since after the first iteration, the > estimate jumps to 50000 -- and then, from there it begins to go > monotonically down, approaching 0.1 > > If you start at 1, it takes 7 iterations to get to 0.1 (with > 6 decimal places accuracy) -- I guess that's quite slow, > depending on the application. > > Binary search, I guess? > > Carlos > --Hello Carlos, One of the standard tricks in hardware is to have a fixed number of iterations. This is so the algo can be pipelined. So using the constraint of a fixed number of iterations is how I arrived at a region of convergence that gets smaller for x small. The region is determined as the domain of the seed so that after the fixed number of iterations the result is within a prespecified epsilon of the true answer. The algo that pairs bits and uses trial subtractions, will produce a known precision given a priori known number of iterations. In this algo, the square root is the same order of complexity as a division. Clay
Reply by ●February 28, 20062006-02-28
On Mon, 27 Feb 2006 11:24:36 -0800, Tim Wescott <tim@seemywebsite.com> wrote:>just do a polynomial expansion, i.e. > >y = a_0 + a_1 x + a_2 x^2 + ... + a_n x^n. > >This reduces to keeping a running result for x^n and doing a MAC for the >a_n * x^n; with two multipliers you should be able to do it in not too >much more than n+1 clock cycles,I have to walk a line between using more "simple" operations (such as shifts and adds) and using fewer "complicated" operations (such as multiplications). Also, as was mentioned in someone else's response, in an FPGA or ASIC it's nice to have the number of iterations (and the configuration of the algorithm) be fixed so that the design can be pipelined. I didn't completely follow your suggestion for how to expand sqrt and 1/x as polynomials. Is there a way to make the algorithm "fixed" so that the same hardware can be used for all cases? Thanks, Greg






