"Shawn Steenhagen" <shawn.NSsteenhagen@NSappliedsignalprocessing.com> writes:

> Randy > > > > From our recent phone conversation and e-mails regarding this post, I > understand why you were offended. My intent was never to insult or offend > you. My intent was to simply share information. I respect you and all who > post on the comp.dsp user group and I sincerely apologize for offending you.

Thank you, Shawn. Any young man who can humble himself and make a public apology as you have just done has regained every ounce of my respect. -- Randy Yates Sony Ericsson Mobile Communications Research Triangle Park, NC, USA randy.yates@sonyericsson.com, 919-472-1124

Randy From our recent phone conversation and e-mails regarding this post, I understand why you were offended. My intent was never to insult or offend you. My intent was to simply share information. I respect you and all who post on the comp.dsp user group and I sincerely apologize for offending you. Sincerely Shawn K. Steenhagen Applied Signal Processing, Inc. www.appliedsignalprocessing.com "Randy Yates" <yates@ieee.org> wrote in message news:7jkmoyuf.fsf@ieee.org...

> "Shawn Steenhagen" <shawn.NSsteenhagen@NSappliedsignalprocessing.com>

writes:

> > > Randy, > > > > Careful about that extra sign bit. > > Shawn, your warning is ludicrous. My paper on scaling in FIR > fixed-point arithmetic - a reference for many who have visited this > group looking for information on the topic - discusses this point. I > wrote it years before I ever heard of your name on this group. I am > also the author of the comp.dsp FAQ item on fixed-point arithmetic > and as such am well-versed on the issue. > > I noticed your tendency to assume I was ignorant at the comp.dsp > conference. Perhaps you should reexamine your assumptions to save > yourself further embarassment. > > I'm not going to honor your point with a response except to say that > you have given me no new information and I still stand by everything > I have asserted in this thread. > -- > % Randy Yates % "She's sweet on Wagner-I think she'd die

for Beethoven.

> %% Fuquay-Varina, NC % She love the way Puccini lays down a

tune, and

> %%% 919-577-9882 % Verdi's always creepin' from her room." > %%%% <yates@ieee.org> % "Rockaria", *A New World Record*, ELO > http://home.earthlink.net/~yatescr

Tim Wescott <tim@wescottnospamdesign.com> writes:

> Randy Yates wrote: > > - snip - > > Do you think that I specially-formulated the posted code so that it > > > would compile efficiently? I did not. In fact the code you have in > > that post is the very first thing I tried. Tim Wescott had asserted > > that vector multiplication was inefficient in C on a DSP. Now I'm not > > sure what sort of strange variations on vector multiplies you want to > > present as exceptions, but it seems to me that the operation is pretty > > clearly defined. > > > > I think that for the most part we're in violent agreement - but you > seem to have (a) have worked mostly with DSP chips with very good tool > support, and (b) to have missed out on the learning curve that many > software folks have when doing DSP for the first time.

Tim, Perhaps the problem is an ambiguity in what the question is. I thought the question was, "Is it feasible syntactically to perform fixed-point arithmetic in C using standard C integer data types?" To that my answer was and still is yes. If instead the question was, "Are you reasonably assured you won't get a significant speed penalty over assembly when performing fixed-point arithmetic (and other types of operations) on a DSP in C across various platforms?", then I'd agree the answer is no, but I didn't hear anyone ask that question.

> Your speed ratios are contrary to my experience with CISC and RISC > processors, and with older DSP toolsets. I see much closer to 1:1 > ratios with the '2812, but even there you can slow things down if > you're not careful, and the optimizer will sometimes take your code > and fling it on the trash pile without telling you (which, I assume, > is why they call it Code Composter). I even tried to get Code > Composter for the TMS320F2812 to cough up a MAC instruction -- it > wouldn't do it for love or money. I'm tempted to try your code > snippet on one the next time I'm at that particular customer site, > just to see what happens. > > > I'm curious though -- since you cite that code snippet why haven't you > commented on the non-ANSI translation done by the '5xxx "C" compiler?

I just did in an alternate part of this thread. -- Randy Yates Sony Ericsson Mobile Communications Research Triangle Park, NC, USA randy.yates@sonyericsson.com, 919-472-1124

I omitted a point in my previous response. Here it is: Randy Yates <randy.yates@sonyericsson.com> writes:

> Tim, > > I respond below to your points: > > Tim Wescott <tim@wescottnospamdesign.com> writes:

> [...]

> > (B) As usual TI is playing fast and loose with the ANSI standard, and > > that isn't even close to ANSI-compatible C. If it were the x[n] * > > y[n] operation would be truncated to 16 bits before being added to > > acc, and the result would be meaningless. Compile that up on machine > > that supports 16-bit and 32-bit integers, print out the results, and > > see what I mean.

You cannot really even begin to state this unless you know how the types I've custom-defined (INT16_T, INT32_T, e.g.) are defined, and what an "int" represents on the TI 54x platform. That is because any "non-long" and "non-long long" integer types are promoted to "int"s. It so happens that an INT16_T is a "short int" and an "int" is 16 bits, so your statement is correct, but I'm not sure you went through the proper integer promotion rules to get to your conclusion. -- Randy Yates Sony Ericsson Mobile Communications Research Triangle Park, NC, USA randy.yates@sonyericsson.com, 919-472-1124

Tim,

I respond below to your points:

Tim Wescott <tim@wescottnospamdesign.com> writes:

> Randy Yates wrote:
> 
> > Tim Wescott <tim@wescottnospamdesign.com> writes:
> >
> 
> >>Randy Yates wrote:
> >>
> >>
> >>>Tim Wescott <tim@wescottnospamdesign.com> writes:
> >>>
> >>
> >>>>[...]
> >>>>Generally if you stick to pure C you are stuck with integer math.
> >>>>DSP's are designed to do fixed-radix math pretty quickly, ...
> >>>
> >>>Tim, I think most of your points are helpful, but this one is
> >>>off-the-mark
> >>
> >>>in my judgement. The typical fixed-point DSP operates much the same as
> >>>the C integer operations, performing integer math. Whether the
> >>>integers are reinterpreted to be fractional, fixed-point, or integer
> >>>is all in the interpretation and has little or nothing to do with the
> >>>implementation of the basic arithmetic operations (add, subtract,
> >>>multiply).
> >>>Of course there are differences between fixed-point DSP ALUs and the
> >>
> >>>"ALU" of a C compiler, the biggest of which are probably the wide
> >>>accumulators and the saturation options when performing various
> >>>operations. There is also the typical "left shift by 1" that a
> >>>fractional DSP does after a multiply to make the result fractional,
> >>>but that is certainly doable in C as well, albeit manually.
> >>
> >>The difference in clock ticks between implementing a fixed-point
> >>arbitrary-radix vector dot-product in assembly on a DSP and trying to
> >>do the same thing to the same precision in C on the same processor is
> >>on the order of 100:1.
> > Who said anything about a vector operation? Your statement was
> 
> >   Generally if you stick to pure C you are stuck with integer math.
> 
> >   DSP's are designed to do fixed-radix math pretty quickly, ...
> > The term "math" does not mean "vector math" in my interpretation.
> 
> >
> 
> >>Even on a MAC-less processor when you are in assembly and multiply two
> >>signed numbers N-bit numbers you can choose to take the lower N bits
> >>of the 2N-1-bit result as C does, or you can take the upper N-1 bits
> >>and do a shift, with way fewer clock cycles (10 or 20:1) than you
> >>could implement the same functionality in C.  I should know -- I've
> >>done it in C a couple of times and in assembly on three or four
> >>different processors.
> > Apparently they did not include the TI TMS320C54x, arguably one of
> 
> > the most popular DSPs around, and on that processor, the following
> > code
> > #include "dsptypes.h"
> 
> > /* definitions */
> 
> > #define VECTOR_LENGTH 64
> 
> > /* local variables */
> 
> > /* local function prototypes */
> 
> > /* function definitions */
> 
> > int main(int margc, char **margv) {
> 
> >    UINT16_T n;
> >    INT16_T  x[VECTOR_LENGTH];
> >    INT16_T  y[VECTOR_LENGTH];
> >    INT32_T  acc;
> >    INT16_T  result;
> >    acc = 0;
> 
> >    for (n = 0; n < VECTOR_LENGTH; n++)
> >    {
> >       x[n] = n;
> >       y[n] = VECTOR_LENGTH - n - 1;
> >    }
> >    acc = 0;
> 
> >    for (n = 0; n < VECTOR_LENGTH; n++)
> >    {
> >       acc += x[n] * y[n];
> >    }
> >    result = (INT16_T)(acc >> 16);
> 
> >    return result;
> 
> > }
> > produces the following assembly language
> 
> > 0000:0108      main
> 
> > 0000:0108 4A11      PSHM  11h
> > 0000:0109 4A17      PSHM  17h
> > 0000:010A EE80      FRAME -128
> > 0000:010B E781      MVMM  SP,AR1
> > 0000:010C 6DE9      MAR   *+AR1(64)
> > 0000:010E E787      MVMM  SP,AR7
> > 0000:010F E782      MVMM  SP,AR2
> > 0000:0110 E800      LD    #0h,A
> > 0000:0111 771A      STM   3fh,1ah
> > 0000:0113 F072      RPTB  11ah
> > 0000:0115      L1
> > 0000:0115 8092      STL   A,*AR2+
> > 0000:0116 E93F      LD    #3fh,B
> > 0000:0117 F520      SUB   A,0,B
> > 0000:0118 8191      STL   B,*AR1+
> > 0000:0119 F000      ADD   #1h,0,A,A
> > 0000:011B      L2
> > 0000:011B E782      MVMM  SP,AR2
> > 0000:011C 6DEA      MAR   *+AR2(64)
> > 0000:011E E783      MVMM  SP,AR3
> > 0000:011F E800      LD    #0h,A
> > 0000:0120 EC3F      RPT   #3fh
> > 0000:0121      L3
> > 0000:0121 B089      MAC   *AR2+,*AR3+,A,A
> > 0000:0122      L4
> > 0000:0122 F0E0      SFTL  A,0,A
> > 0000:0123 F0F0      SFTL  A,-16,A
> > 0000:0124 6BF8      ADDM  80h,*(18h)
> > 0000:0127 F495      NOP   0000:0128 F495      NOP   0000:0129 8A17
> > POPM  17h
> 
> > 0000:012A 8A11      POPM  11h
> > 0000:012B F4E4      FRET  Both the vector multiply and the end shift
> > look pretty damn efficient
> 
> > to me, Tim. Thus even if we agree to interpret your point
> > differently, it's still
> 
> > inaccurate for one of the most popular DSPs in the world.
> 
> (A) That is the _only_ case that I know of for sure that the compiler
> can figure out it needs to use a MAC and shift -- the version of Code
> Composter that comes with the '2812 certainly doesn't do this, or I
> couldn't find the magic finger-ring combination.

By "case" do you mean "case of compiler version and architecture"? 

Perhaps you are right. I didn't really "sneak" this combination on you
- I work with the 54x a LOT and it was the easiest thing for me to
try. And the vector multiply sample seems straightforward.

CEVA's Teak compiler produced 22 lines of assembly for the
multiply-accumulate step, so it is clearly inefficient. Due to
licensing and confidentiality issues, I'm not posting the result here.

However, the TI 5510 C compiler also produced efficient code:

;*******************************************************************************
;* TMS320C55x C/C++ Codegen                                  Unix Version 2.40 *
;* Date/Time created: Wed Mar  9 09:07:28 2005                                 *
;*******************************************************************************
	.mmregs
	.cpl_on
	.arms_on
	.c54cm_off
	.asg AR6, FP
	.asg XAR6, XFP
	.asg DPH, MDP
	.model call=c55_std
	.model mem=large
	.noremark 5549  ; code avoids SE CPU_28
	.noremark 5558  ; code avoids SE CPU_33
	.noremark 5570  ; code avoids SE CPU_40
	.noremark 5571  ; code avoids SE CPU_41
	.noremark 5573  ; code avoids SE CPU_43
	.noremark 5584  ; code avoids SE CPU_47
	.noremark 5599  ; code avoids SE CPU_55
	.noremark 5503  ; code avoids SE CPU_84 MMR write
	.noremark 5505  ; code avoids SE CPU_84 MMR read
	.noremark 5002  ; code respects overwrite rules
;*******************************************************************************
;* GLOBAL FILE PARAMETERS                                                      *
;*                                                                             *
;*   Architecture       : TMS320C55x                                           *
;*   Optimization       : Always Choose Smaller Code Size                      *
;*   Memory             : Large Model (23-Bit Data Pointers)                   *
;*   Calls              : Normal Library ASM calls                             *
;*   Debug Info         : Standard TI Debug Information                        *
;*******************************************************************************
	.file	"/home/unix/us057845/projects/simmac/ti/simmac.c"
;	opt55 -m -O3 /var/tmp/aaaa0057b /var/tmp/daaa0057b -w /home/unix/us057845/projects/simmac/ti/dsp-5510/ 

	.sect	".text"
	.align 4
	.global	_main
	.sym	_main,_main, 36, 2, 0
	.func	13
;----------------------------------------------------------------------
;  13 | int main(int margc, char **margv)                                      
;----------------------------------------------------------------------
;*******************************************************************************
;* FUNCTION NAME: _main                                                        *
;*                                                                             *
;*   Function Uses Regs : AC0,AC0,AC1,AC1,T0,AR1,AR2,XAR2,AR3,XAR3,AR4,FP,XFP, *
;*                        SP,BRC0,CARRY,M40,SATA,SATD,RDM,FRCT,SMUL            *
;*   Save On Entry Regs : FP                                                   *
;*   Stack Frame        : Full (Frame Pointer in AR6, w/ debug)                *
;*   Total Frame Size   : 131 words                                            *
;*                        (2 return address/alignment)                         *
;*                        (1 frame pointer)                                    *
;*                        (128 local values)                                   *
;*******************************************************************************

;*******************************************************************************
;*                                                                             *
;* Using -g (debug) with optimization (-o3) may disable key optimizations!     *
;*                                                                             *
;*******************************************************************************
_main:
	.line	2
;----------------------------------------------------------------------
;  15 | UINT16_T n;                                                            
;  16 | INT16_T  x[VECTOR_LENGTH];                                             
;  17 | INT16_T  y[VECTOR_LENGTH];                                             
;  18 | INT32_T  acc;                                                          
;  19 | INT16_T  result;                                                       
;  21 | acc = 0;                                                               
;----------------------------------------------------------------------
;* T0    assigned to _margc
	.sym	_margc,12, 4, 17, 16
;* AR0   assigned to _margv
	.sym	_margv,17, 82, 17, 23
;* BRC0  assigned to L$1
;* BRC0  assigned to L$2
;* AR1   assigned to L$2
;* AR1   assigned to L$1
;* AR3   assigned to U$12
;* AR3   assigned to U$12
;* AR2   assigned to U$5
;* AR2   assigned to U$5
;* AR1   assigned to _n
	.sym	_n,18, 13, 4, 16
;* AC0   assigned to _acc
	.sym	_acc,0, 5, 4, 32
	.sym	_x,0, 51, 1, 1024,, 64
	.sym	_y,64, 51, 1, 1024,, 64
        PSHBOTH XFP
        ADD #-128, mmap(SP)
        MOV XSP, XAR3
        MOV XSP, XAR2
        AMAR *+AR3(#64)
        MOV XSP, XFP
	.line	10
;----------------------------------------------------------------------
;  22 | for (n = 0; n < VECTOR_LENGTH; n++)                                    
;----------------------------------------------------------------------
        MOV #63, BRC0

        RPTBLOCAL L2-1
||      MOV #0, AR1

                                            ; loop starts
L1:    
	.line	12
;----------------------------------------------------------------------
;  24 | x[n] = n;                                                              
;----------------------------------------------------------------------
            MOV AR1, *AR2+ ; |24| 
	.line	13
;----------------------------------------------------------------------
;  25 | y[n] = VECTOR_LENGTH - n - 1;                                          
;----------------------------------------------------------------------
            MOV #63, AR4 ; |25| 
            SUB AR1, AR4 ; |25| 
            MOV AR4, *AR3+ ; |25| 
	.line	14
            ADD #1, AR1 ; |26| 
                                        ; loop ends	; |26| 
L2:    
        MOV XSP, XAR3
        MOV XSP, XAR2
        AMAR *+AR3(#64)
	.line	16
;----------------------------------------------------------------------
;  28 | acc = 0;                                                               
;  29 | for (n = 0; n < VECTOR_LENGTH; n++)                                    
;----------------------------------------------------------------------
        MOV #63, BRC0

        RPTBLOCAL L4-1
||      MOV #0, AC0 ; |28| 

                                            ; loop starts
L3:    
	.line	19
;----------------------------------------------------------------------
;  31 | acc += x[n] * y[n];                                                    
;----------------------------------------------------------------------
            MPYM *AR3+, *AR2+, AC1 ; |31| 
            MOV mmap(AC1L), AC1 ; |31| 
            ADD AC1, AC0 ; |31| 
	.line	20
;----------------------------------------------------------------------
;  34 | result = (INT16_T)(acc >> 16);                                         
;----------------------------------------------------------------------
                                        ; loop ends	; |32| 
L4:    
	.line	24
;----------------------------------------------------------------------
;  36 | return result;                                                         
;----------------------------------------------------------------------
        MOV HI(AC0), T0
	.line	25
        ADD #128, mmap(SP) ; |36| 
        POPBOTH XFP
        RET       ; |36| 
                                        ; return occurs	; |36| 
	.endfunc	37,000000080h,129



;*******************************************************************************
;* TYPE INFORMATION                                                            *
;*******************************************************************************
	.sym	_INT16_T, 0, 3, 13, 16
	.sym	_UINT16_T, 0, 13, 13, 16
	.sym	_INT32_T, 0, 5, 13, 32


> (B) As usual TI is playing fast and loose with the ANSI standard, and
> that isn't even close to ANSI-compatible C.  If it were the x[n] *
> y[n] operation would be truncated to 16 bits before being added to
> acc, and the result would be meaningless.  Compile that up on machine
> that supports 16-bit and 32-bit integers, print out the results, and
> see what I mean.

According to ANSI specification ISO/IEC 9899-1999(E), the compiler is
not in violation. In a nutshell, the operation result is undefined and
so performing the operation in extended precision is acceptable. See
paragraph 2 of section 3.4.3 on "undefined behavior."

However, you're right in that what I wrote was a bit sloppy and
non-portable.  The fix is very simple; change the line in question to:

  acc += (INT32_T)x[n] * y[n];.

> Furthermore, you are actually making my point:  by starting with an
> awareness of the one thing that sets a DSP apart from the rest and
> warping your code to fit that one thing you can make the operation
> very fast.  But in production code you will have to be constantly on
> guard to make sure that the C code isn't "improved" in such a way that
> makes the compiler implement it as a bunch of "traditional" integer
> operations, thereby making it take 10-100 times slower, and likely
> reintroducing the truncation (more like 10, TI is good at making fast
> processors).

Well I'm not sure what your point is anymore. First you started out by
stating that one is stuck with integer math when using pure C.  Then
you backed off and stated that your point was that vector multiplies
are inefficient. Now it seems like you're saying "they're inefficient
sometimes on some platforms." At least that seems to be the case. 

The key point I wanted to make to the OP and to you is that it is
feasible to perform fixed-point arithmetic in C (whether on a DSP or
not), and that it even runs reasonably well on some platforms, i.e.
within 1000 percent of hand-tweaked assembly. (This definition of
reasonable is certainly debatable depending on the application's
requirements.)

A side issue that developed (i.e., via Jerry) seems to be whether or
not the fractional "left shift by one" is significant in the
determination to perform fixed-point processing in C. I say it is not
for the following reasons:

1. The "left shift by one" issue is not really just for "fractional"
processing but rather for any fixed-point multiply. The question is
generally whether the total dynamic range needs to be preserved or
the SNR is more important. 

2. A requirement to left shift by one can be brought out of the
inner loop in MAC type of operations and thus be made negligible.

3. It is not always desirable to left shift by one and take the high
16 bits after multiplying two fixed-point 16-bit values into a
resulting 32-bit value. The N bits you take from M bits that result
from a fixed-point operation when N < M can be all over the map
depending on several application issues.

-- 
Randy Yates
Sony Ericsson Mobile Communications
Research Triangle Park, NC, USA
randy.yates@sonyericsson.com, 919-472-1124

Daniel: Check out Tables 8-3 and 8-4 of section 8.5.2 of the C6000 Compilers user guide: http://focus.ti.com/lit/ug/spru187l/spru187l.pdf this sections lists the intrinsics that the 64x compiler recognizes. These intrinsics contain operations common in Q15 operations. For example _smpy( ) will multiply and shift by 1 with saturation protection. I also see that the "int" data type on the 64 is 32bits, a "short" is 16bits and a "long" is 40 bits (table 7-1), so keep that in mind as well. -Shawn "Daniel" <d.lohausen@freenet.de> wrote in message news:e83ccc31.0503080137.2f641499@posting.google.com...

> d.lohausen@freenet.de (Daniel) wrote in message

news:<e83ccc31.0503030745.68a487cc@posting.google.com>...

> > Hello Everybody, > > > > for my diploma thesis, I have to implement a Least-Mean-Square > > Algorithm on a fixed-point DSP (TI 6416). The LMS was implemented on a > > floating-point processor(TI 6713) earlier, so I just to the code and > > copied it. Of course, there are a lot of float variables in the code. > > When I ran the program, it workes for small FIR orders (6), but the > > larger the order of the filter, the worse the result. > > Is this because the 6416 cannot work with floating-point numbers > > accuretly? > > What can I do? When I convert all the float variables to integers I > > get overflow problems. > > > > Thanks a lot > > Daniel > > Hello Everybody, > > thanks for all you help. I found out that the algorithm was too slow > in my application. For higher filter orders, the LMS was still > calculating when new samples from the codec had arrived. A software > interrupt was called that started the LMS from the beginning, so it > didn`t end the previous cycle. > Now I removed all the codec stuff and I works for higher filter > orders, but, of course, far too slow. Now I have to adapt the > algorithm to fixed-point. > > I read all the posts and I have a new question: I understand what the > Q.15 data type is but how can I use in in C? Is there a special data > type or what? > > Thanks again > Daniel

d.lohausen@freenet.de (Daniel) writes:

> d.lohausen@freenet.de (Daniel) wrote in message news:<e83ccc31.0503030745.68a487cc@posting.google.com>... > > Hello Everybody, > > > > for my diploma thesis, I have to implement a Least-Mean-Square > > Algorithm on a fixed-point DSP (TI 6416). The LMS was implemented on a > > floating-point processor(TI 6713) earlier, so I just to the code and > > copied it. Of course, there are a lot of float variables in the code. > > When I ran the program, it workes for small FIR orders (6), but the > > larger the order of the filter, the worse the result. > > Is this because the 6416 cannot work with floating-point numbers > > accuretly? > > What can I do? When I convert all the float variables to integers I > > get overflow problems. > > > > Thanks a lot > > Daniel > > Hello Everybody, > > thanks for all you help. I found out that the algorithm was too slow > in my application. For higher filter orders, the LMS was still > calculating when new samples from the codec had arrived. A software > interrupt was called that started the LMS from the beginning, so it > didn`t end the previous cycle. > Now I removed all the codec stuff and I works for higher filter > orders, but, of course, far too slow. Now I have to adapt the > algorithm to fixed-point. > > I read all the posts and I have a new question: I understand what the > Q.15 data type is but how can I use in in C? Is there a special data > type or what?

No. Simply use a 16-bit integer on your platform and then follow the rules for fixed-point arithmetic, which I give in my paper: http://home.earthlink.net/~yatescr/fp.pdf You should also be careful, as Tim Wescott has correctly pointed out, to ensure intermediate results are properly computed by the C compiler. E.g., to ensure the two 16-bit values are multiplied using a 32-bit result, do the following explicit cast: INT32_T y; /* this is scaled A(1, 30) INT16_T x1; /* this is scaled A(0, 15) INT16_T x2; /* this is scaled A(0, 15) y = (INT32_T)x1 * x2. Similarly if you want to add two A(0, 15) values, you still will want to explicitly cast one operand to 32 bits if you want to ensure you avoid overflows. You're going to have to become an expert on C type conversions to be sure things work the way you expect. -- Randy Yates Sony Ericsson Mobile Communications Research Triangle Park, NC, USA randy.yates@sonyericsson.com, 919-472-1124

Daniel wrote:

> d.lohausen@freenet.de (Daniel) wrote in message

news:<e83ccc31.0503030745.68a487cc@posting.google.com>...

> > Hello Everybody, > > > > for my diploma thesis, I have to implement a Least-Mean-Square > > Algorithm on a fixed-point DSP (TI 6416). The LMS was implemented

on a

> > floating-point processor(TI 6713) earlier, so I just to the code

and

> > copied it. Of course, there are a lot of float variables in the

code.

> > When I ran the program, it workes for small FIR orders (6), but the > > larger the order of the filter, the worse the result. > > Is this because the 6416 cannot work with floating-point numbers > > accuretly? > > What can I do? When I convert all the float variables to integers I > > get overflow problems. > > > > Thanks a lot > > Daniel > > Hello Everybody, > > thanks for all you help. I found out that the algorithm was too slow > in my application. For higher filter orders, the LMS was still > calculating when new samples from the codec had arrived. A software > interrupt was called that started the LMS from the beginning, so it > didn`t end the previous cycle. > Now I removed all the codec stuff and I works for higher filter > orders, but, of course, far too slow. Now I have to adapt the > algorithm to fixed-point. > > I read all the posts and I have a new question: I understand what the > Q.15 data type is but how can I use in in C? Is there a special data > type or what? > > Thanks again > Daniel

Makes sense. Just use integer type and play around with a few multiples and look at the results, everything will make sense if you try a small algorithm.

Daniel wrote:

> d.lohausen@freenet.de (Daniel) wrote in message

news:<e83ccc31.0503030745.68a487cc@posting.google.com>...

> > Hello Everybody, > > > > for my diploma thesis, I have to implement a Least-Mean-Square > > Algorithm on a fixed-point DSP (TI 6416). The LMS was implemented

on a

> > floating-point processor(TI 6713) earlier, so I just to the code

and

> > copied it. Of course, there are a lot of float variables in the

code.

> > When I ran the program, it workes for small FIR orders (6), but the > > larger the order of the filter, the worse the result. > > Is this because the 6416 cannot work with floating-point numbers > > accuretly? > > What can I do? When I convert all the float variables to integers I > > get overflow problems. > > > > Thanks a lot > > Daniel > > Hello Everybody, > > thanks for all you help. I found out that the algorithm was too slow > in my application. For higher filter orders, the LMS was still > calculating when new samples from the codec had arrived. A software > interrupt was called that started the LMS from the beginning, so it > didn`t end the previous cycle. > Now I removed all the codec stuff and I works for higher filter > orders, but, of course, far too slow. Now I have to adapt the > algorithm to fixed-point. > > I read all the posts and I have a new question: I understand what the > Q.15 data type is but how can I use in in C? Is there a special data > type or what? > > Thanks again > Daniel

I think this has been discussed in this thread already. John

d.lohausen@freenet.de (Daniel) wrote in message news:<e83ccc31.0503030745.68a487cc@posting.google.com>...

> Hello Everybody, > > for my diploma thesis, I have to implement a Least-Mean-Square > Algorithm on a fixed-point DSP (TI 6416). The LMS was implemented on a > floating-point processor(TI 6713) earlier, so I just to the code and > copied it. Of course, there are a lot of float variables in the code. > When I ran the program, it workes for small FIR orders (6), but the > larger the order of the filter, the worse the result. > Is this because the 6416 cannot work with floating-point numbers > accuretly? > What can I do? When I convert all the float variables to integers I > get overflow problems. > > Thanks a lot > Daniel

Hello Everybody, thanks for all you help. I found out that the algorithm was too slow in my application. For higher filter orders, the LMS was still calculating when new samples from the codec had arrived. A software interrupt was called that started the LMS from the beginning, so it didn`t end the previous cycle. Now I removed all the codec stuff and I works for higher filter orders, but, of course, far too slow. Now I have to adapt the algorithm to fixed-point. I read all the posts and I have a new question: I understand what the Q.15 data type is but how can I use in in C? Is there a special data type or what? Thanks again Daniel