Jagadeesh- > Agreed!. It is hard to know the expertise level of the person > from an e-mail, so I erred in the direction of providing more > information. But, let's move on!. > > I do look forward to stimulating discussions on optimizations > for C62x, C64x and C67x. I will keep my answers fairly general, > so that everybody in the user's group can benefit. You know of course that any time the discussion focuses on optimization, that I will drag out your name and refer to your original Nov/Dec '02 posts, which I have memorized. That's what you get for being an expert :-) -Jeff

Hi Wojciech Rewers,

>well... as I said before - I do respect all participants of this group and all people in

>general... I admit - my remark was a bit "cynical" but spitful?

Am sorry, you have misunderstood the word i used here! I meant spiteful(showing a disposition or inclination to annoy or hurt) i guess its a lot different from spitful!!!!!!

anyways,i will have to concede, even spitful was probably over the top.So sorry abt that! Just wanted to prove a point!

>well... I'm not going to mow anybody down or anything... plus - I don't consider the goups >as a number 1 source of knowledge - I have been parcicipating in the group for many >months now and I was trying to help much more often than I actually asked about anything...

I agree, you are trying to be an active member,No qualms there either.

>I didn't mow down Rick when he tried to help me - we both agreed that

>neither of us is an expert on C6x assembly...

Well, even there i was slightly uneasy with your tongue in cheek comment about whether rick had ever programmed c6000 at all...but ill let that one pass...

>sum up - I don't think I'm spitful towards anybody -

Not spiteful perhaps,may be more like cynical, if you will...

>and even if I was, here it goes:

>I do apologize to Jagadeesh and anybody else that I ever offended on this group...

Thats a nice gesture,just sankaran nobody else...

>I'm sorry - it's just my sense of humour - I can't

>resist... is it about mr Sankaran or TI engineers? ;-)

Am there with you on that. :)

>well - this will not be about mr Sankaran or anyone else in particular - rather a general >statement about being clever... I think being a good teacher means

>giving clever one-line answers wherever you can!

Not always,definetly not all the time?

>believe my help on this group is like that I try not to give lectures to people

Some people, even if they try, cant give good lectures.And some people give great lectures.

Its an altogether different issue whether you like lectures or not!

>well... as I said - I don't hold anything against anybody on this group - on the contrary - I >appreciate a good discussion as it's just another stimulus for my brain... that's all...

Couldnt agree with you more(notwithstanding my desire to disagree!) on that.Your current thread has been a great stimulus alrigh,It has opened up my eyes to so many new things

>so - now I wish everybody a nice day ;-) I know it's morning out there in the US ;-)

Good day to you to Mr.Rewers. But hey, i live in India.Its night here.So,good night will do fine.

Ps:You are a good engineer and you seem to have all the right things.Iam sure you will make it in the field for sure.Come to india??!!

Bhooshan

Get personal loans. It's hassle-free. It's approved instantly.

Agreed!. It is hard to know the expertise level of the person from an e-mail, so I erred in the direction of providing more information. But, let's move on!. I do look forward to stimulating discussions on optimizations for C62x, C64x and C67x. I will keep my answers fairly general, so that everybody in the user's group can benefit. Regards Jagadeesh Sankaran

> Hi all, and Wojciech Rewers, hi all... >> yeah - all squares are rectangles, but not all rectangles are squares - logic taught to a 10 year old... > May be iam slightly old fashioned here...but are'nt we supposed to be respectful to people who are trying to help us? and not be cynical/spiteful? well... as I said before - I do respect all participants of this group and all people in general... I admit - my remark was a bit "cynical" but spitful? hm... after all - I only rephrased what Jagadeesh said and I quote: All double-word addresses are word_aligned. Not all word aligned addresses are double word aligned. well - for me this is a logic taught to a 10 year old and there is nothing spitful in it... and my comment was not to insult Jagadeesh or anybody else - it was rather in the tone of "yeah - let's skip the obvious and get to the point"... > Wow, it makes me wonder what will happen if someone gives a wrong answer to your question,you would mow them down? intellectually,perhaps? what with all your "i need know why am doing what am doing,or ill kill you attitude"? well... I'm not going to mow anybody down or anything... plus - I don't consider the goups as a number 1 source of knowledge - I have been parcicipating in the group for many months now and I was trying to help much more often than I actually asked about anything... now I asked about this sample-by-sample FIR and with the help from the group I managed to develop the code I needed - I appreciate the group's help... that's all... but my point is - I don't demand any help from anybody - if you can help me - or have something to add in the subject - go ahead and do so - but I'm not going to mow you down if your contribution is of no value... I didn't mow down Rick when he tried to help me - we both agreed that neither of us is an expert on C6x assembly... so - to sum up - I don't think I'm spitful towards anybody - and even if I was, here it goes: I do apologize to Jagadeesh and anybody else that I ever offended on this group... > BTW,Mr.sankaran has been one of the brightest and most erudite ti engineer to participate in the group. I'm sorry - it's just my sense of humour - I can't resist... is it about mr Sankaran or TI engineers? ;-) > And if you notice he doesnt give CLEVER one line answers-He is a great teacher who PROVES most things he says. well - this will not be about mr Sankaran or anyone else in particular - rather a general statement about being clever... I think being a good teacher means giving clever one-line answers wherever you can! I believe my help on this group is like that - I try not to give lectures to people - rather point to the source or even just give one hint from which the person in need can solve his problem... because after all - I'm not going to solve anybody's problems, but I do offer a hint wherever I can... > Anyways, to your credit,(i hope...) i have to say, you seem(?) to be the kind who could take as much as you can give! > So,friends? adults? > So,Dont bother apologising, or may be you should! well... as I said - I don't hold anything against anybody on this group - on the contrary - I appreciate a good discussion as it's just another stimulus for my brain... that's all... so - now I wish everybody a nice day ;-) I know it's morning out there in the US ;-) Wojciech Rewers __________________________________

Hi all, and Wojciech Rewers,

>yeah - all squares are rectangles, but not all

>rectangles are squares - logic taught to a 10 year old...

May be iam slightly old fashioned here...but are'nt we supposed to be respectful to people who are trying to help us? and not be cynical/spiteful?

Wow, it makes me wonder what will happen if someone gives a wrong answer to your question,you would mow them down? intellectually,perhaps? what with all your "i need know why am doing what am doing,or ill kill you attitude"?

Btw,Mr.sankaran has been one of the brightest and most erudite ti engineer to participate in the group.And if you notice he doesnt give CLEVER one line answers-He is a great teacher who PROVES most things he says.I agree, he dint initially fall into the "sample-by-sample" drift right away, but cmon man...cut him some slack here!

Anyways, to your credit,(i hope...) i have to say, you seem(?) to be the kind who could take as much as you can give!

So,friends? adults?

So,Dont bother apologising, or may be you should!

Hah!

Bhooshan

MSN Hotmail now on your Mobile phone. Click here.

My comments were in general, and to be clear and not directed at the understanding or lack of, for any one individual in particular. This was not my aim. I was tring to be clear. Hope I did not annoy anybody. At the same point I did not want to leave anybody out. First of all appologies to all, Wojciech Rewers in particular, if I came across as being so. The reason that the stack pointer needs to be double word aligned, is to facilitate running code across multiple platforms in particular C62x code on C67x, the reason being that C67x can perform LDDW to load registers from the stack. Even then the correct way to do it is to pre-decrement the stack by the number of words you intend to use upfront, and not decrement it one at a time. However the code shown on Page 8-12 is correct because within an ISR unless you re-enable GIE {shown on next page of PRG} you do not respond to interrupts. Further, since an even number of registers are pushed and popped, if the stack pointer is double-word aligned to begin with, it will be double-word aligned at the end as well. So, this example "happens" to work. However the "best" way to do it would be the way the C compiler does it {yet another reason why tools are better }. I will show the assembly statements used by the C compiler to accomplish maintenance of the stack. Notice that even the loads from the stack are being done using load double words. Notice the pre- decrement and post increment. Saves to stack: MV .S1X SP,A9 ; |5| || STW .D2T1 A10,*SP--(24) ; |5| STW .D2T2 B13,*+SP(20) MVK .S2 32,B5 || STW .D2T2 B12,*+SP(16) SUB .L2X A4,B5,B5 || STW .D2T2 B11,*+SP(12) MV .S1X DP,A10 ; save dp || STW .D1T1 A14,*-A9(20) || MV .L2 B4,B11 || STW .D2T2 B10,*+SP(8) || MVC .S2 CSR,B4 ADD .L2 4,B5,B12 || LDDW .D2T2 *B11++(32),B7:B6 ; |47| (P) <0,1> || MV .S1X B4,A8 || AND .S2 -2,B4,B4 SHR .S1 A6,3,A4 ; |47| || MV .L1X B5,A6 || MVC .S2 B4,CSR ; interrupts off || LDW .D2T2 *++B12(32),DP ; |47| (P) <0,0> Restores from the stack: MV .S1X SP,A9 ; |55| || ADDSP .L2 B7,B1,B1 ; |47| (E) <3,14> ^ ADDSP .L2 B4,B2,B2 ; |47| (E) <3,15> ^ ADDSP .L1 A5,A0,A0 ; |47| (E) <3,16> ^ || ADDSP .L2 B4,B13,B13 ; |47| (E) <3,16> ^ LDDW .D2T2 *+SP(8),B11:B10 ; |55| || MV .S1X B10,A8 || MV .S2X A8,B4 MV .S2X A10,DP ; restore dp MVC .S2 B4,CSR ; interrupts on LDDW .D2T2 *+SP(16),B13:B12 ; |55| || ADDSP .L1X A8,B13,A6 ; |54| NOP 3 LDW .D1T1 *+A9(4),A14 ; |55| || MV .S2X A14,B3 ; |55| || ADDSP .L1X B3,A6,A3 ; |54| LDW .D2T1 *++SP(24),A10 ; |55| Regards Jagadeesh Sankaran

There is a way to get 1.6 multiplies/cycle for single sample FIR. This involves carrying two versions, one for the even output samples where both the input and the filter arrays are double word aligned, and one for the odd output samples where the input is word aligned and filter array is double word aligned. In this case you compute even output samples at the rate of 2 multiplies/cycle and odd output samples at the rate of 1.2 multiplies/cycle. Since both these versions are going to be called for an equal number of times, in steady state you will get an averaging effect of computing at the rate of 1.6 multiplies/cycle. double -word aligned: are addresses that end in 0x0 and 0x8. word-aligned : are addresses that end in 0x0, 0x4, 0x8, 0xC All double-word addresses are word_aligned. Not all word aligned addresses are double word aligned. You can request alignment from C for the linker to use by saying: #pragma DATA_ALIGN(x, 8) float x[100]; These statements align the start of array x, or x itself to be double-word aligned from C. Regards Jagadeesh Sankaran

A lot of my experience has been with fixed point DSP's C62x and more so C64x DSP. However I wanted to write out some code and prove to myself that the tools are indeed the best way to get there, that too for an example as simple as an FIR. Here is my first pass stab at a single sample FIR C code. This code models merely the FIR part of the dot-product, it does not model the circular buffer. I shall address this in the latter part of my e-mail. #include <stdio.h> #include <stdlib.h> float fir(float *restrict x, float *restrict h, int N) { int i; float sum; /*-----------*/ /* Initialize FIR accumulator to zero, prior */ /* to the start of the computation. */ /*-----------*/ sum = 0; /*-----------*/ /* If this is a C6000 build, then inform the */ /* compiler about any safe assumptions that */ /* can be made. In this case we assume that */ /* input array is word aligned and filter */ /* arrays is double word aligned. */ /* word aligned. I am assuming that your */ /* filter has at least 16 taps and is a */ /* multiple of 8. */ /*-----------*/ #ifdef TMS320C6X _nassert((int)(x)%4 == 0); _nassert((int)(h)%8 == 0); _nassert((int)(N)%8 == 0); _nassert((int)(N >= 16)); #endif /*-----------*/ /* The following loop iterates over N filter */ /* taps computing the FIR sum, one tap at a */ /* time. */ /*-----------*/ for ( i = 0; i < N; i++) { /*--------*/ /* Compute sum of products over all filter */ /* taps accumulating the result into sum. */ /*--------*/ sum += x[i] * h[i]; } /*-----------*/ /* Return accumulated single sample FIR. */ /*-----------*/ return sum; } I compiled using the current shipping 4.31 tools. I used the following flags for my compile: cl6x -k -o2 -mwtx -mv6700 -mh -dTMS320C6X fir.c The compiler produced the following output in which two multiplies are issued every cycle. Although the code is written in a straight forward way, tthe odd and even taps in parallel and accumulates them into seperate accumulators. It finally adds the seperate accumulators prior to returning. I will now reproduce the assembler output, which you can reproduce as well {Isnt that nice ?}. .sect ".text" .global _fir ;****************************************************************************** ;* FUNCTION NAME: _fir * ;* * ;* Regs Modified : A0,A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A14,B0,B1,B2,B3,B4,* ;* B5,B6,B7,B8,B9,B10,B11,B12,B13,DP,SP * ;* Regs Used : A0,A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A14,B0,B1,B2,B3,B4,* ;* B5,B6,B7,B8,B9,B10,B11,B12,B13,DP,SP * ;* Local Frame Size : 0 Args + 0 Auto + 24 Save = 24 byte * ;****************************************************************************** _fir: ;** --* MV .S1X SP,A9 ; |5| || STW .D2T1 A10,*SP--(24) ; |5| STW .D2T2 B13,*+SP(20) MVK .S2 32,B5 || STW .D2T2 B12,*+SP(16) SUB .L2X A4,B5,B5 || STW .D2T2 B11,*+SP(12) MV .S1X DP,A10 ; save dp || STW .D1T1 A14,*-A9(20) || MV .L2 B4,B11 || STW .D2T2 B10,*+SP(8) || MVC .S2 CSR,B4 ADD .L2 4,B5,B12 || LDDW .D2T2 *B11++(32),B7:B6 ; |47| (P) <0,1> || MV .S1X B4,A8 || AND .S2 -2,B4,B4 SHR .S1 A6,3,A4 ; |47| || MV .L1X B5,A6 || MVC .S2 B4,CSR ; interrupts off || LDW .D2T2 *++B12(32),DP ; |47| (P) <0,0> LDW .D1T1 *++A6(32),A3 ; |47| (P) <0,2> || LDDW .D2T2 *-B11(24),B5:B4 ; |47| (P) <0,2> ;*----* ;* SOFTWARE PIPELINE INFORMATION ;* ;* Loop source line : 40 ;* Loop opening brace source line : 41 ;* Loop closing brace source line : 48 ;* Loop Unroll Multiple : 8x ;* Known Minimum Trip Count : 2 ;* Known Max Trip Count Factor : 1 ;* Loop Carried Dependency Bound(^) : 4 ;* Unpartitioned Resource Bound : 6 ;* Partitioned Resource Bound(*) : 6 ;* Resource Partition: ;* A-side B-side ;* .L units 2 6* ;* .S units 1 0 ;* .D units 6* 6* ;* .M units 2 6* ;* .X cross paths 2 4 ;* .T address paths 6* 6* ;* Long read paths 0 0 ;* Long write paths 0 4 ;* Logical ops (.LS) 0 0 (.L or .S unit) ;* Addition ops (.LSD) 1 0 (.L or .S or .D unit) ;* Bound(.L .S .LS) 2 3 ;* Bound(.L .S .D .LS .LSD) 4 4 ;* ;* Searching for software pipeline schedule at ... ;* ii = 6 Schedule found with 4 iterations in parallel ;* ;* Register Usage Table: ;* +---------------------------------+ ;* |AAAAAAAAAAAAAAAA|BBBBBBBBBBBBBBBB| ;* |0000000000111111|0000000000111111| ;* |0123456789012345|0123456789012345| ;* |----------------+----------------| ;* 0: |**** * | ** **** *** | ;* 1: |***** ** | ****** *** | ;* 2: |******** | *** ******* | ;* 3: | ******* |* *** ***** * | ;* 4: | **** * |** **** **** * | ;* 5: | *** * |*** ***** ** * | ;* +---------------------------------+ ;* ;* Done ;* ;* Epilog not entirely removed ;* Collapsed epilog stages : 2 ;* ;* Prolog not entirely removed ;* Collapsed prolog stages : 1 ;* ;* Minimum required memory pad : 64 bytes ;* ;* Minimum safe trip count : 1 (after unrolling) ;*----* ;* SETUP CODE ;* ;* MV A6,B12 ;* ADD 4,B12,B12 ;* ;* SINGLE SCHEDULED ITERATION ;* ;* C23: ;* 0 LDW .D2T2 *++B12(32),DP ; |47| ;* 1 LDDW .D2T2 *B11++(32),B7:B6 ; |47| ;* 2 LDW .D1T1 *++A6(32),A3 ; |47| ;* || LDDW .D2T2 *-B11(24),B5:B4 ; |47| ;* 3 LDW .D2T2 *+B12(4),B6 ; |47| ;* || LDW .D1T1 *+A6(12),A3 ; |47| ;* 4 LDDW .D2T2 *-B11(16),B7:B6 ; |47| ;* || LDW .D1T1 *+A6(16),A4 ; |47| ;* 5 LDW .D1T1 *+A6(20),A4 ; |47| ;* || LDDW .D2T2 *-B11(8),B9:B8 ; |47| ;* 6 LDW .D1T1 *+A6(24),A5 ; |47| ;* 7 MPYSP .M1X B6,A3,A4 ; |47| ;* || MPYSP .M2 B7,DP,B4 ; |47| ;* || LDW .D1T1 *+A6(28),A4 ; |47| ;* 8 MPYSP .M2 B4,B6,B4 ; |47| ;* 9 MPYSP .M2X B6,A4,B8 ; |47| ;* 10 MPYSP .M2X B7,A4,B7 ; |47| ;* 11 ADDSP .L1 A4,A7,A7 ; |47| ^ ;* || ADDSP .L2 B4,B3,B3 ; |47| ^ ;* || MPYSP .M2X B8,A5,B4 ; |47| ;* 12 ADDSP .L2 B4,B10,B10 ; |47| ^ ;* || MPYSP .M2X B5,A3,B4 ; |47| ;* || MPYSP .M1X B9,A4,A5 ; |47| ;* 13 ADDSP .L2 B8,B0,B0 ; |47| ^ ;* || [ A1] SUB .S1 A1,1,A1 ; |48| ;* 14 ADDSP .L2 B7,B1,B1 ; |47| ^ ;* || [ A1] B .S1 C23 ; |48| ;* 15 ADDSP .L2 B4,B2,B2 ; |47| ^ ;* 16 ADDSP .L2 B4,B13,B13 ; |47| ^ ;* || ADDSP .L1 A5,A0,A0 ; |47| ^ ;* 17 NOP 3 ;* ; BRANCH OCCURS ; |48| ;*----* L1: ; PIPED LOOP PROLOG LDW .D1T1 *+A6(12),A3 ; |47| (P) <0,3> || LDW .D2T2 *+B12(4),B6 ; |47| (P) <0,3> LDW .D1T1 *+A6(16),A4 ; |47| (P) <0,4> || LDDW .D2T2 *-B11(16),B7:B6 ; |47| (P) <0,4> ZERO .S2 B0 ; |47| || ZERO .L2 B2 ; |47| || ZERO .S1 A7 ; |47| || LDDW .D2T2 *-B11(8),B9:B8 ; |47| (P) <0,5> || LDW .D1T1 *+A6(20),A4 ; |47| (P) <0,5> ZERO .S2 B1 ; |47| || ZERO .L1 A0 ; |47| || ZERO .L2 B13 ; |47| || MV .S1X B3,A14 || LDW .D2T2 *++B12(32),DP ; |47| (P) <1,0> || LDW .D1T1 *+A6(24),A5 ; |47| (P) <0,6> MVK .S1 0x1,A2 ; init prolog collapse predicate || SUB .L1 A4,1,A1 || ZERO .S2 B3 ; |47| || ZERO .L2 B10 ; |47| || MPYSP .M1X B6,A3,A4 ; |47| (P) <0,7> || MPYSP .M2 B7,DP,B4 ; |47| (P) <0,7> || LDDW .D2T2 *B11++(32),B7:B6 ; |47| (P) <1,1> || LDW .D1T1 *+A6(28),A4 ; |47| (P) <0,7> ;** --* L2: ; PIPED LOOP KERNEL [ A1] B .S1 L2 ; |48| <0,14> || [!A2] ADDSP .L2 B7,B1,B1 ; |47| <0,14> ^ || MPYSP .M2 B4,B6,B4 ; |47| <1,8> || LDDW .D2T2 *-B11(24),B5:B4 ; |47| <2,2> || LDW .D1T1 *++A6(32),A3 ; |47| <2,2> [!A2] ADDSP .L2 B4,B2,B2 ; |47| <0,15> ^ || MPYSP .M2X B6,A4,B8 ; |47| <1,9> || LDW .D1T1 *+A6(12),A3 ; |47| <2,3> || LDW .D2T2 *+B12(4),B6 ; |47| <2,3> [!A2] ADDSP .L2 B4,B13,B13 ; |47| <0,16> ^ || [!A2] ADDSP .L1 A5,A0,A0 ; |47| <0,16> ^ || MPYSP .M2X B7,A4,B7 ; |47| <1,10> || LDDW .D2T2 *-B11(16),B7:B6 ; |47| <2,4> || LDW .D1T1 *+A6(16),A4 ; |47| <2,4> ADDSP .L1 A4,A7,A7 ; |47| <1,11> ^ || MPYSP .M2X B8,A5,B4 ; |47| <1,11> || ADDSP .L2 B4,B3,B3 ; |47| <1,11> ^ || LDDW .D2T2 *-B11(8),B9:B8 ; |47| <2,5> || LDW .D1T1 *+A6(20),A4 ; |47| <2,5> [ A2] SUB .S1 A2,1,A2 ; <0,18> || MPYSP .M2X B5,A3,B4 ; |47| <1,12> || MPYSP .M1X B9,A4,A5 ; |47| <1,12> || ADDSP .L2 B4,B10,B10 ; |47| <1,12> ^ || LDW .D1T1 *+A6(24),A5 ; |47| <2,6> || LDW .D2T2 *++B12(32),DP ; |47| <3,0> [ A1] SUB .S1 A1,1,A1 ; |48| <1,13> || ADDSP .L2 B8,B0,B0 ; |47| <1,13> ^ || MPYSP .M1X B6,A3,A4 ; |47| <2,7> || LDW .D1T1 *+A6(28),A4 ; |47| <2,7> || MPYSP .M2 B7,DP,B4 ; |47| <2,7> || LDDW .D2T2 *B11++(32),B7:B6 ; |47| <3,1> ;** --* L3: ; PIPED LOOP EPILOG MV .S1X SP,A9 ; |55| || ADDSP .L2 B7,B1,B1 ; |47| (E) <3,14> ^ ADDSP .L2 B4,B2,B2 ; |47| (E) <3,15> ^ ADDSP .L1 A5,A0,A0 ; |47| (E) <3,16> ^ || ADDSP .L2 B4,B13,B13 ; |47| (E) <3,16> ^ LDDW .D2T2 *+SP(8),B11:B10 ; |55| || MV .S1X B10,A8 || MV .S2X A8,B4 MV .S2X A10,DP ; restore dp MVC .S2 B4,CSR ; interrupts on LDDW .D2T2 *+SP(16),B13:B12 ; |55| || ADDSP .L1X A8,B13,A6 ; |54| NOP 3 LDW .D1T1 *+A9(4),A14 ; |55| || MV .S2X A14,B3 ; |55| || ADDSP .L1X B3,A6,A3 ; |54| LDW .D2T1 *++SP(24),A10 ; |55| NOP 2 ADDSP .L1 A7,A3,A3 ; |54| NOP 3 ADDSP .L1 A0,A3,A0 ; |54| NOP 3 ADDSP .L1X B2,A0,A0 ; |54| NOP 3 ADDSP .L1X B1,A0,A0 ; |54| NOP 1 RET .S2 B3 ; |55| NOP 1 ADDSP .L1X B0,A0,A4 ; |54| NOP 3 ; BRANCH OCCURS ; |55| Extra Comments --------------- a. This code decrements the stack frame by 16-bytes to store 3 words A10, B10, B11. Even though 12 bytes of stack storage would have been adequate, it needs to decrement 16 bytes, in order to leave the incoming double word aligned stack frame double word aligned at the end of the transaction. b. You only need to save A10-A15 and B10-B15 if these are being modified by your code. You need not worry about other registers you are modifying. c. Also notice the comments "interrupts off". This is where the compiler truns off the "GIE" bit off CSR to guarantee that interrupts dont mess you up, now that you are no longer in single register assignment mode. d. Also notice how two prolog and epilog stages have been collapsed to achieve code-size reductions. e. The compiler finds a 6 cycle loop in which 8 multiplies are performed to achieve 1.3 multiplies/cycle. The reason, this is not a 4 cycle loop, is because the input array can only be assumed to be word-aligned, as only one output fir sample is computed at a time. the filter array can be assumed to be double word aligned. f. Notice from the compiler feedback {-mw} flag, how all units are maxed out. g. Also, if one could compute even two fir output samples at a time. In the C code, specifying _nassert((int)(x)%8 == 0); and computing two output samples in parallel would give a better multiplier utilization. Circular buffer --------------- The delay line is modeled by keeping an array of input samples, in the input array of size KN and copying the last N -1 samples, after computing (K-1)N output samples, to the head of the array. Since the memcpy is done once, for every (K-1)N output samples, it is inexpensive, as opposed to maintaining the delay line manually. This avoids explictly having to incorporate circular buffering in your code. X: <-N input samples->|<N input samples->|..............|<N input samples>| If you still want to use circular buffering you could use serial assembly to do so. Regards Jagadeesh Sankaran

Wojciech- > but what do you mean that the stack pointer needs to > be double word aligned? isn't the stack pointer B15? > how can that be double word aligned? or do you mean > that I should push/pop always double words? hm... is > it even possible? could you please elaborate on that? Sub 8 and AND with 0xfffffff8 before doing any additional push/pop. Watch out for alignment upon return. -Jeff

First of all, I would still strongly advise folks! not to give up hopes on tools and automatic code generation in a flash. I will illustrate the perils, based on the code that has been developed at: http://www.wrewers.karolin.pl/firc.asm This code has several issues. I am not trying to nit-pick. Developing hand-optimized VLIW code has its perils. Let the tools do their job. I will list some of the bugs that immediately catch my eye. I cannot vouch that I have caught all of them. BTW all these bugs can be avoided by using tools, so that one does not have to become intricately familiar to do code development of a simple FIR. a. The stack pointer needs to be double word aligned, at all times. The stores to the stack need to be done by pre-decrementing the stack frame and leaving it double word aligned. b. The save on entry registers A10-A15 and B10-B15 need to be saved for sure, upon entry to a function. c. This code is not single register assignment, and hence you need to turn off interrupts while you are in this code. d. This single cycle loop as shown could have been achieved with the tools, without a doubt. Further, if you allow at least two output samples to be computed in parallel, you can get 100% multiplier utilization. e. Most fir implementations are written to perform block processing, which is why the serial ports are buffered McBSP, with the B for Buffering. f. Also the delay line is implemented once as a memcpy, by moving the block of samples required for overlap, without explictly doing it in the kernel. This removes the need for circular buffering. g. Take a look at codec_edma.c under the DSK directory for an example of block based interaction with the serial port. By the way I have not seen anything spectacular to give up on the tools yet! Regards Jagadeesh Sankaran