Hello Everybody, for my diploma thesis, I have to implement a Least-Mean-Square Algorithm on a fixed-point DSP (TI 6416). The LMS was implemented on a floating-point processor(TI 6713) earlier, so I just to the code and copied it. Of course, there are a lot of float variables in the code. When I ran the program, it workes for small FIR orders (6), but the larger the order of the filter, the worse the result. Is this because the 6416 cannot work with floating-point numbers accuretly? What can I do? When I convert all the float variables to integers I get overflow problems. Thanks a lot Daniel
Porting LMS from floating-point to fixed-point processor
Started by ●March 3, 2005
Reply by ●March 3, 20052005-03-03
Daniel wrote:> Hello Everybody, > > for my diploma thesis, I have to implement a Least-Mean-Square > Algorithm on a fixed-point DSP (TI 6416). The LMS was implemented on a > floating-point processor(TI 6713) earlier, so I just to the code and > copied it. Of course, there are a lot of float variables in the code. > When I ran the program, it workes for small FIR orders (6), but the > larger the order of the filter, the worse the result. > Is this because the 6416 cannot work with floating-point numbers > accuretly? > What can I do? When I convert all the float variables to integers I > get overflow problems. >I've never had to think about implementing a LMS algorithm in fixed-point, so here's some general comments: The way to work with fixed-point arithmetic is to determine the ranges of your data and your coefficients, then jigger things around to keep the results (final _and_ intermediate) in range. Everything else flows from that. Generally if you stick to pure C you are stuck with integer math. DSP's are designed to do fixed-radix math pretty quickly, and TI has a library to deal with that (can't remember the name -- someone jump in here and name it!). At it's most basic a DSP will do a vector dot product of a bunch of integers with a fixed shift -- this is usually enough to get you anything you need, if you're clever enough. Look at how you can arrange things to use a fractional vector multiply. This is what DSP's are built for (look for the MAC instruction), so it should at least be possible. I've been consistently disappointed by the libraries that come with DSP tools, but perhaps the 6416 is different -- and if not there's always assembly. IIRC the LMS algorithms that I've seen reduce to finding a vector dot product or three and finding one scalar reciprocal. All these things are possible to do _fast_ with a fixed-point DSP. Don't be afraid to use 32-bit data paths -- even a 16-bit DSP should be made so that you can do 4 16-bit vector dot products and combine them into one 32-bit result. Keep in mind that you can use block floating point. The ADSP-2101 documentation has a fine discussion of this format. That processor has a vector normalization instruction -- your processor should as well. Block floating point is a very good compromise between "real" floating point and all-fixed-point math. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
Reply by ●March 3, 20052005-03-03
Daniel, I would guess the reason you are not getting good results for larger orders is that the processor is "running out of cylces". In other words, the floating point operations are taking too many MIPS when run on a native fixed point machine. (This does not surprise me). What is your sample rate? You need to get a grasp on the concepts of Q-types and Q-arithmetic rules and get rid of all "floats" from your code. The basic part of the update equation for LMS tap n, at time k is Atap(n) = Atap(n) + mu*X(n)*e(k); To do the above in fixed point, Q15 arithmetic (using pseudo code): q15_mu_e = HIWORD( (q15_mu*q15_e) <<1); for(n=0:N-1) Atap[n] = Atap[n] + HIWORD( (q15_mu_e*q15_x[n]) <<1 ); If you want to stay in the C domain (and not learn 64x Assembly), you may need to learn the compiler intrinsics for getting proper saturation/overflow results, proper shifts, maintaining 32 bits of data, etc. // For example, on the C55: (this is real C code) int16 q15_mu_e; int16 q15_mu; int16 q15_e; int16 q15_atap[NTAPS]; int16 q15_x[NTAPS]; . . // do the LMS update q15_mu_e = _smpy(q15_mu,q15_e); // multiply, shift left by one, take upper 16 bits for (n=0;n<NTAPS;n++) q15_atap[n] += _smpy(q15_mu_e, q15_x[n]); I haven't used the 64x family, but I imagine it has simliar intrinsics or a section on how to properly use type casting in a C statement to get the result you need. The above implementation does not do any rounding and does not maintain any intermediate value as 32bits, and so it is by no means going to give you the best results. Also, it doesn't cover the circular buffer updates or the LMS filter output calculations. So its not going to work with adding that functionality too. Check the TI web sites for App notes on the LMS algorithm for the C54x or C55 family, it should provide some more insight into doing LMS on a fixed point (it may also provide material relating to Q-types and the like. Good Luck. You have a fair amount of work ahead of you. -Shawn "Daniel" <d.lohausen@freenet.de> wrote in message news:e83ccc31.0503030745.68a487cc@posting.google.com...> Hello Everybody, > > for my diploma thesis, I have to implement a Least-Mean-Square > Algorithm on a fixed-point DSP (TI 6416). The LMS was implemented on a > floating-point processor(TI 6713) earlier, so I just to the code and > copied it. Of course, there are a lot of float variables in the code. > When I ran the program, it workes for small FIR orders (6), but the > larger the order of the filter, the worse the result. > Is this because the 6416 cannot work with floating-point numbers > accuretly? > What can I do? When I convert all the float variables to integers I > get overflow problems. > > Thanks a lot > Daniel
Reply by ●March 3, 20052005-03-03
Tim Wescott <tim@wescottnospamdesign.com> writes:> [...] > Generally if you stick to pure C you are stuck with integer math. > DSP's are designed to do fixed-radix math pretty quickly, ...Tim, I think most of your points are helpful, but this one is off-the-mark in my judgement. The typical fixed-point DSP operates much the same as the C integer operations, performing integer math. Whether the integers are reinterpreted to be fractional, fixed-point, or integer is all in the interpretation and has little or nothing to do with the implementation of the basic arithmetic operations (add, subtract, multiply). Of course there are differences between fixed-point DSP ALUs and the "ALU" of a C compiler, the biggest of which are probably the wide accumulators and the saturation options when performing various operations. There is also the typical "left shift by 1" that a fractional DSP does after a multiply to make the result fractional, but that is certainly doable in C as well, albeit manually. -- Randy Yates Sony Ericsson Mobile Communications Research Triangle Park, NC, USA randy.yates@sonyericsson.com, 919-472-1124
Reply by ●March 3, 20052005-03-03
Randy Yates wrote:> Tim Wescott <tim@wescottnospamdesign.com> writes: > >>[...] >>Generally if you stick to pure C you are stuck with integer math. >>DSP's are designed to do fixed-radix math pretty quickly, ... > > > Tim, > > I think most of your points are helpful, but this one is off-the-mark > in my judgement. The typical fixed-point DSP operates much the same as > the C integer operations, performing integer math. Whether the > integers are reinterpreted to be fractional, fixed-point, or integer > is all in the interpretation and has little or nothing to do with the > implementation of the basic arithmetic operations (add, subtract, > multiply). > > Of course there are differences between fixed-point DSP ALUs and the > "ALU" of a C compiler, the biggest of which are probably the wide > accumulators and the saturation options when performing various > operations. There is also the typical "left shift by 1" that a > fractional DSP does after a multiply to make the result fractional, > but that is certainly doable in C as well, albeit manually.Doesn't the usual fixed-point hardware do a shift after multiplying? "Redundant sign bit" and all that. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Reply by ●March 3, 20052005-03-03
Jerry Avins <jya@ieee.org> writes:> Doesn't the usual fixed-point hardware do a shift after multiplying? > "Redundant sign bit" and all that.Depends on the processor. The TI TMS3205xx series does not by default (there is a register for setting this behavior). The Motorola does. -- Randy Yates Sony Ericsson Mobile Communications Research Triangle Park, NC, USA randy.yates@sonyericsson.com, 919-472-1124
Reply by ●March 3, 20052005-03-03
Tim Wescott wrote:> Daniel wrote: > > Hello Everybody, > > > > for my diploma thesis, I have to implement a Least-Mean-Square > > Algorithm on a fixed-point DSP (TI 6416). The LMS was implementedon a> > floating-point processor(TI 6713) earlier, so I just to the codeand> > copied it. Of course, there are a lot of float variables in thecode.> > When I ran the program, it workes for small FIR orders (6), but the > > larger the order of the filter, the worse the result. > > Is this because the 6416 cannot work with floating-point numbers > > accuretly? > > What can I do? When I convert all the float variables to integers I > > get overflow problems. > > > I've never had to think about implementing a LMS algorithm in > fixed-point, so here's some general comments: > > The way to work with fixed-point arithmetic is to determine theranges> of your data and your coefficients, then jigger things around to keep> the results (final _and_ intermediate) in range. Everything elseflows> from that. > > Generally if you stick to pure C you are stuck with integer math.DSP's> are designed to do fixed-radix math pretty quickly, and TI has alibrary> to deal with that (can't remember the name -- someone jump in hereand> name it!). At it's most basic a DSP will do a vector dot product ofa> bunch of integers with a fixed shift -- this is usually enough to get> you anything you need, if you're clever enough. > > Look at how you can arrange things to use a fractional vectormultiply.> This is what DSP's are built for (look for the MAC instruction), soit> should at least be possible. I've been consistently disappointed bythe> libraries that come with DSP tools, but perhaps the 6416 is different--> and if not there's always assembly. IIRC the LMS algorithms thatI've> seen reduce to finding a vector dot product or three and finding one > scalar reciprocal. All these things are possible to do _fast_ with a> fixed-point DSP. > > Don't be afraid to use 32-bit data paths -- even a 16-bit DSP shouldbe> made so that you can do 4 16-bit vector dot products and combine them> into one 32-bit result. > > Keep in mind that you can use block floating point. The ADSP-2101 > documentation has a fine discussion of this format. That processorhas> a vector normalization instruction -- your processor should as well. > Block floating point is a very good compromise between "real"floating> point and all-fixed-point math. > > -- > > Tim Wescott > Wescott Design Services > http://www.wescottdesign.comTI uses the name 'dsplib' for their free code libraries. The '54X library has a fixed point LMS function. The routines are 100% assembly, and source is provided. If nothing else they are a good starting point. I have used the FFT and FIR routines quite a bit. The 21xx normalization instruction, if I recall correctly, automatically keeps track of the worst case shift as you traverse a vector. The '54X version does not do that. John
Reply by ●March 3, 20052005-03-03
Randy Yates wrote:> Jerry Avins <jya@ieee.org> writes: > > >>Doesn't the usual fixed-point hardware do a shift after multiplying? >>"Redundant sign bit" and all that. > > > Depends on the processor. The TI TMS3205xx series does not by default > (there is a register for setting this behavior). The Motorola does.Default or not, it isn't behavior one gets automatically from int operations in C. That was my point. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
Reply by ●March 4, 20052005-03-04
Randy Yates wrote:> Tim Wescott <tim@wescottnospamdesign.com> writes: > >>[...] >>Generally if you stick to pure C you are stuck with integer math. >>DSP's are designed to do fixed-radix math pretty quickly, ... > > > Tim, > > I think most of your points are helpful, but this one is off-the-mark > in my judgement. The typical fixed-point DSP operates much the same as > the C integer operations, performing integer math. Whether the > integers are reinterpreted to be fractional, fixed-point, or integer > is all in the interpretation and has little or nothing to do with the > implementation of the basic arithmetic operations (add, subtract, > multiply). > > Of course there are differences between fixed-point DSP ALUs and the > "ALU" of a C compiler, the biggest of which are probably the wide > accumulators and the saturation options when performing various > operations. There is also the typical "left shift by 1" that a > fractional DSP does after a multiply to make the result fractional, > but that is certainly doable in C as well, albeit manually.The difference in clock ticks between implementing a fixed-point arbitrary-radix vector dot-product in assembly on a DSP and trying to do the same thing to the same precision in C on the same processor is on the order of 100:1. Even on a MAC-less processor when you are in assembly and multiply two signed numbers N-bit numbers you can choose to take the lower N bits of the 2N-1-bit result as C does, or you can take the upper N-1 bits and do a shift, with way fewer clock cycles (10 or 20:1) than you could implement the same functionality in C. I should know -- I've done it in C a couple of times and in assembly on three or four different processors. So no, I don't think it's off the mark at all. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com
Reply by ●March 4, 20052005-03-04
Daniel wrote:> Hello Everybody, > > for my diploma thesis, I have to implement a Least-Mean-Square > Algorithm on a fixed-point DSP (TI 6416). The LMS was implemented on a > floating-point processor(TI 6713) earlier, so I just to the code and > copied it. Of course, there are a lot of float variables in the code. > When I ran the program, it workes for small FIR orders (6), but the > larger the order of the filter, the worse the result. > Is this because the 6416 cannot work with floating-point numbers > accuretly? > What can I do? When I convert all the float variables to integers I > get overflow problems. > > Thanks a lot > DanielDaniel, You can't just convert floating point variables into integers and expect it to work fine. What you ought to do is to represent the floating point numbers in signed-mantissa-exponent format (the signed mantissa and the exponent you can represent as integers). Then, you need to write little routines that do basic arithmetic operations like multiply(), divide(), add(), subtract(), etc. that you can use to replace the * / + and - in your code. The TI compiler automatically does this for you and links in a floating point library when you declare "float" variables. However, based on the requirements of your code (precision, range, etc) you can choose to write your own optimized functions that could be faster and optimized for your application. Regards, Ravi