c6x | Question about optimization

Hi, 1-I am implementing a loop in assembly for a code that was originally written in C. When I use the C code with the -k, (keep generated assembly files) -mw and -o3 compiler parameter on the compiler generates the "SOFTWARE PIPELINE INFORMATION". I know the case is the same when using Linear assembly. However, what happens if I am writing directly in assembly? why isn't this information available? How can I pipeline the assembly code without the compilers tips ... ? 2-I wish to reference a variable that was defined in C as: #define numOfDataPoints 1024 In my assembly code as: .ref _numOfDataPoints The compiler generates an error that the symbol is undefined. Declaring the variable as: const numOfDataPoints24; solved the issue however it generated a different error since I am using the value for defining the size of an array and array size must be a const so the compiler did not allow it. Any ideas ? Thanks, Shlomo.

Reply by Andrew Elder ●November 9, 20052005-11-09

Shlomo, Can you post the C code here for the group to look at (and comment on) ? What processor are you using ? Consider ASM a last resort. You should be able to get the C compiler to generate VERY close to optimal code if you have everything organized correctly. You should be able to get VERY VERY VERY close to optimal with linear assembly. - Andrew shlomo_kashani wrote: >Hi, >1-I am implementing a loop in assembly for a code that was originally >written in C. When I use the C code with the -k, (keep generated >assembly files) -mw and -o3 compiler parameter on the compiler >generates the "SOFTWARE PIPELINE INFORMATION". I know the case is the >same when using Linear assembly. However, what happens if I am >writing directly in assembly? why isn't this information available? >How can I pipeline the assembly code without the compilers tips ... ? > >2-I wish to reference a variable that was defined in C as: >#define numOfDataPoints 1024 > >In my assembly code as: > >.ref _numOfDataPoints > >The compiler generates an error that the symbol is undefined. >Declaring the variable as: const numOfDataPoints24; >solved the issue however it generated a different error since I am >using the value for defining the size of an array and array size must >be a const so the compiler did not allow it. Any ideas ? > >Thanks, > >Shlomo. >

Reply by shlomo_kashani ●November 9, 20052005-11-09

Dear Andrew and Gregory, Thanks for the prompt reply. I have just moved the assembly code to Linear assembly (took me some time ... ). Generally speaking I am using the 6711 but now I am trying the simulator as I do not have a parallel port on my laptop (only USB ...). Per your request, I am posting the C code, the assembly code, and the linear assembly code. Just to make it easier to understand the input is: y[]={0.220649, ; 0.439838, ; 0.656116, ; 0.868058, ; 1.074272, ; 1.273410, ; 1.464181, ; 1.645360, ; } and the output should be: x[]={0.220649, ;0.0 ; 0.439838, ; 0.0 ; 0.656116, ; 0.0 ; 0.868058, ; 0.0 ; 1.074272, ; 0.0 ; 1.273410, ; 0.0 ; 1.464181, ; 0.0 ; 1.645360, ; }; Linear assembly: .global _load_array; declare the function as global so that it ; can ber eferenced from C _load_array .cproc x_addr, y_addr ; .reg memOffset .reg FFTPnts .reg jCnt .reg zeroValue .reg x_value .reg y_value ZERO memOffset ;offset register, used for pointer aritmetic ZERO jCnt ;used for generating the values of memOffset ZERO zeroValue ;used for setting odd terms to Zero MVK 2048,FFTPnts ;numOfFFTDataPoints 48 .no_mdep ;no loop: .trip 1024, 1024 ;assume between 2048/2 and 2048/2 iterations SHL jCnt,1,memOffset ;shift left (multiply jCnt by two-->memOffset) ADD 1,jCnt,jCnt ;jCnt++ LDW *y_addr++,y_value; ;load a 32byte word into y_value, postincreamnet ;the y_addr pointer STW y_value,*x_addr[memOffset] ;store the value into x_addr offset by memOffset ADD 1,memOffset,memOffset ;memOffset++. memOffset is now an odd term STW zeroValue,*x_addr[memOffset];store the Zero value into x_addr offset by memOffset ;which is now an odd term SUB FFTPnts,2,FFTPnts ;loop decreamnet by 2 [FFTPnts] B loop ;continue until all points computed .endproc Assembly: .def _load_array .text ;text section ;.ref _numOfDataPoints _load_array: ; MVK _numOfDataPoints,A1 MVK 1024,A1 || ZERO A8 ;Offset register, used for pointer aritmetic ZERO A9 ;(j) || ZERO B9 LOOP: SHL .S1 A9,1,A8 ;shift left (multiply A9 by two-->A8) || ADD .L1 1,A9,A9 ;A9++ LDW .D2 *B4++,B2; NOP 4 STW .D1 B2,*A4[A8] ;A8 is an even term NOP 3 ADD .L1 1,A8,A8 ;A8++. A8 is now an odd term STW .D1 B9,*A4[A8] NOP 3 SUB .S1 A1,2,A1 ; [A1] B .S2 LOOP NOP 5 B .S2 B3 NOP 5 .end C: j=0; for (i=0;i<2048;i+=2){ data[i]=Input_Signal[j]; j=j+1; data[i+1]=0; } Thanks, Shlomo. --- In c6x@c6x@..., Andrew Elder <andrew_elder@b...> wrote: > > Shlomo, > > Can you post the C code here for the group to look at (and comment on) ? > > What processor are you using ? > > Consider ASM a last resort. You should be able to get the C compiler to generate VERY close to optimal code if you have everything organized correctly. You should be able to get VERY VERY VERY close to optimal with linear assembly. > > - Andrew > > shlomo_kashani wrote: > > >Hi, > >1-I am implementing a loop in assembly for a code that was originally > >written in C. When I use the C code with the -k, (keep generated > >assembly files) -mw and -o3 compiler parameter on the compiler > >generates the "SOFTWARE PIPELINE INFORMATION". I know the case is the > >same when using Linear assembly. However, what happens if I am > >writing directly in assembly? why isn't this information available? > >How can I pipeline the assembly code without the compilers tips ... ? > > > >2-I wish to reference a variable that was defined in C as: > >#define numOfDataPoints 1024 > > > >In my assembly code as: > > > >.ref _numOfDataPoints > > > >The compiler generates an error that the symbol is undefined. > >Declaring the variable as: const numOfDataPoints24; > >solved the issue however it generated a different error since I am > >using the value for defining the size of an array and array size must > >be a const so the compiler did not allow it. Any ideas ? > > > > > > > >Thanks, > > > >Shlomo. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >

Reply by Andrew Elder ●November 9, 20052005-11-09

Shlomo, This void zeropad(float *Input_Signal, float *data) { int i; for (i=0;i<2048;i+=2){ data[i]=Input_Signal[i/2]; data[i+1]=0; } } gives L11: ; PIPED LOOP PROLOG ;** --* L12: ; PIPED LOOP KERNEL [!A1] LDW .D1T1 *+A6[A5],A5 ; ^ |238| NOP 4 [ B0] SUB .L2 B0,1,B0 ; |240| || SHR .S2 B6,2,B8 ; |238| || [!A1] STW .D2T2 B4,*++B7(16) ; |239| || [!A1] STW .D1T1 A5,*++A3(16) ; ^ |238| [!A1] STW .D1T1 A4,*+A3(12) ; |239| || [ B0] B .S1 L12 ; |240| || [!A1] LDW .D2T2 *+B5[B8],B8 ; ^ |238| NOP 3 ADD .D1 8,A0,A0 ; |240| [ A1] SUB .D1 A1,1,A1 ; || ADD .S2 8,B6,B6 ; |240| || [!A1] STW .D2T2 B8,*+B7(4) ; ^ |238| || SHR .S1 A0,2,A5 ; @|238| ;** --* L13: ; PIPED LOOP EPILOG This void zeropad(const float *Input_Signal, float *data) { int i; for (i=0;i<2048;i+=2){ data[i]=*Input_Signal++; data[i+1]=0; } } gives ;** --* L12: ; PIPED LOOP KERNEL LDW .D2T2 *++B5(8),B7 ; @|241| || [ B0] B .S1 L12 ; @|243| || LDW .D1T1 *++A4(8),A5 ; @@|241| [!A1] STW .D2T2 B4,*++B6(16) ; |242| || [!A1] STW .D1T1 A5,*++A0(16) ; |241| [ A1] SUB .S1 A1,1,A1 ; || [!A1] STW .D2T2 B7,*+B6(4) ; |241| || [!A1] STW .D1T1 A3,*+A0(12) ; |242| || [ B0] SUB .S2 B0,1,B0 ; @@|243| ;** --* L13: ; PIPED LOOP EPILOG This is 3 clocks to write 4 values in the data[] array. Not too bad in my opinion and you won't do any better using assembly language or linear assembly. - Andrew E. shlomo_kashani wrote:

Dear Andrew and Gregory,
Thanks for the prompt reply. I have just moved the assembly code to Linear
assembly (took me some time ... ). Generally speaking I am using the 6711 but
now I am trying the simulator as I do not have a parallel port on my laptop
(only USB ...). Per your request, I am posting the C code, the assembly code,
and the linear assembly code. Just to make it easier to understand the input is:
y[]={0.220649,
;	0.439838,
;	0.656116,
;	0.868058,
;	1.074272,
;	1.273410,
;	1.464181,
;	1.645360,
;	}
and the output should be:
x[]={0.220649,
;0.0
;	0.439838,
;	0.0
;	0.656116,
;	0.0
;	0.868058,
;	0.0
;	1.074272,
;	0.0
;	1.273410,
;	0.0
;	1.464181,
;	0.0
;	1.645360,
;	};
Linear assembly:
.global _load_array;	declare the function as global so that it ;	can ber
eferenced from C
_load_array .cproc x_addr, y_addr	;
.reg	memOffset
.reg	FFTPnts
.reg	jCnt
.reg	zeroValue	.reg	x_value
.reg	y_value
ZERO memOffset	;offset register, used for pointer aritmetic ZERO jCnt	;used for
generating the values of memOffset	ZERO zeroValue	;used for setting odd terms to
Zero
MVK 2048,FFTPnts	;numOfFFTDataPoints 48 .no_mdep	;no loop: .trip 1024, 1024
;assume between 2048/2 and 2048/2 iterations
SHL	jCnt,1,memOffset	;shift left (multiply jCnt by two-->memOffset)	ADD
1,jCnt,jCnt ;jCnt++
LDW	*y_addr++,y_value;	;load a 32byte word into y_value, postincreamnet ;the
y_addr pointer	STW y_value,*x_addr[memOffset]	;store the value into x_addr
offset by memOffset ADD 1,memOffset,memOffset	;memOffset++. memOffset is now an
odd term	STW	zeroValue,*x_addr[memOffset];store the Zero value into x_addr
offset by memOffset
;which is now an odd term
SUB	FFTPnts,2,FFTPnts ;loop decreamnet by 2
[FFTPnts]	B loop	;continue until all points computed .endproc
Assembly:
.def _load_array .text ;text section	;.ref _numOfDataPoints _load_array: ;	MVK
_numOfDataPoints,A1 MVK 1024,A1 ||	ZERO A8	;Offset register, used for pointer
aritmetic ZERO A9 ;(j)
||	ZERO B9 LOOP: SHL .S1 A9,1,A8	;shift left (multiply A9 by two-->A8)	||	ADD
.L1 1,A9,A9 ;A9++
LDW .D2	*B4++,B2;
NOP	4	STW .D1 B2,*A4[A8]	;A8 is an even term
NOP	3	ADD .L1 1,A8,A8	;A8++. A8 is now an odd term	STW .D1 B9,*A4[A8]
NOP	3	SUB .S1 A1,2,A1 ;
[A1]	B .S2	LOOP NOP	5	B	.S2	B3 NOP 5 .end C:
j=0;
for (i=0;i<2048;i+=2){
data[i]=Input_Signal[j];
j=j+1;
data[i+1]=0;
}
Thanks,
Shlomo. --- In c...@yahoogroups.com, Andrew Elder <andrew_elder@b...>
wrote:

Shlomo,
Can you post the C code here for the group to look at (and comment

on) ?

What processor are you using ?
Consider ASM a last resort. You should be able to get the C

compiler to generate VERY close to optimal code if you have
everything organized correctly. You should be able to get VERY VERY VERY close
to optimal with linear assembly.

- Andrew
shlomo_kashani wrote:

Hi,
1-I am implementing a loop in assembly for a code that was

originally

written in C. When I use the C code with the -k, (keep generated
assembly files) -mw and -o3 compiler parameter on the compiler generates the
"SOFTWARE PIPELINE INFORMATION". I know the case is

the

same when using Linear assembly. However, what happens if I am
writing directly in assembly? why isn't this information

available?

How can I pipeline the assembly code without the compilers

tips ... ?

2-I wish to reference a variable that was defined in C as:
#define numOfDataPoints 1024
In my assembly code as: .ref _numOfDataPoints
The compiler generates an error that the symbol is undefined. Declaring the
variable as: const numOfDataPoints24; solved the issue however it generated a
different error since I am using the value for defining the size of an array and
array size

must

be a const so the compiler did not allow it. Any ideas ?
Thanks,
Shlomo.

Reply by shlomo_kashani ●November 9, 20052005-11-09

Andrew, Thanks for using -k for me :) The *Input_Signal++ line is exactly what I am using in linear assembly as LDW *y_addr++,y_value;. The point here is to write in assembly or linear assembly to get familiarized with its usage. I am still interested in compiler generated optimized code from the linear assembly code (rather then optimized code based on C code). I am using these compiler options: ;* Architecture : TMS320C671x ;* Optimization : Enabled at level 3 ;* Optimizing for : Speed ;* Based on options: -o3, no -ms ;* Endian : Little ;* Interrupt Thrshld : Disabled ;* Data Access Model : Far Aggregate Data ;* Pipelining : Enabled ;* Speculate Loads : Disabled ;* Memory Aliases : Presume not aliases (optimistic) ;* Debug Info : Optimized w/Profiling Info But when I look at the generated asm code I see that the compiler has stripped the linear assembly code and embedded it inside the main function. Now it is impossible for me to view the pipelined assembly code for the function (and loop) as I did before (or as you posted). This is what the compiler generates: CALL .S1 _my_asm_function ; |24| || STW .D2T1 A15,*+SP(8256) STW .D2T1 A11,*+SP(8220) STW .D2T1 A10,*+SP(8216) STW .D1T1 A13,*-A9(28) || STW .D2T2 B11,*+SP(8244) || MVKL .S1 _Input_Signal,A3 ; |24| STW .D1T1 A12,*-A9(32) || MVKL .S2 RL0,B3 ; |24| || STW .D2T2 B12,*+SP(8248) || MVKH .S1 _Input_Signal,A3 ; |24| MVKH .S2 RL0,B3 ; |24| || STW .D1T1 A14,*-A9(24) || MV .L2X A3,B4 ; |24| || STW .D2T2 B13,*+SP(8252) || ADD .L1X 12,SP,A4 ; |24| RL0: ; CALL OCCURS {_my_asm_function} ; |24| MVK .S1 0x1,A7 ; |35| How can I avoid the stripping? Which compiler option controls that? Shlomo.

Reply by Jagadeesh Bhaskar P ●November 10, 20052005-11-10

Hi, This is regarding ur question 2. > 2-I wish to reference a variable that was defined in C as: > #define numOfDataPoints 1024 ^^^^^^^^ Let me just explain the effect of "#define". This is actually not defining a variable, but is generating a macro. So the variable is never linked, but textually replaced at all points of occurance. > > In my assembly code as: > > .ref _numOfDataPoints Now u r trying to refer to a variable which doesn't exist (because before linker, the preprocessor has replaced all occurances of numOfDataPoints with 1024). > The compiler generates an error that the symbol is undefined. Obvious!! > Declaring the variable as: const numOfDataPoints24; Now u r declaring a variable; with #define u hadn't. (BTW it is better to mention the data type also, like "const int numOfDataPoints = 1024") > solved the issue however it generated a different error since I am > using the value for defining the size of an array and array size must > be a const so the compiler did not allow it. Any ideas ? Just use "int numOfDataPoints24;" HTH -- jag "Quaerendo invenietis" The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s)and may contain confidential or privileged information. If you are not the intended recipient, please notify the sender or administrator@admi...

Reply by Bhooshan Iyer ●November 26, 20052005-11-26

Shlomo--

You cannot control assembly code using compiler options. When you are doing assembly coding you are literally scheduling the 16 stage C67xx DSP pipeline and you dont get any code generation tools to help you at assembly.All that help stops with linear assembly. What you write is what you get with hand coded assembly.

--Bhooshan

On 11/10/05, shlomo_kashani <s...@yahoo.com> wrote:

Andrew,
Thanks for using -k for me :)

The  *Input_Signal++ line is exactly what I am using in linear
assembly as LDW *y_addr++,y_value;.

The point here is to write in assembly or linear assembly to get
familiarized with its usage. I am still interested in compiler
generated optimized code from the linear assembly code (rather then
optimized code based on C code).

I am using these compiler options:

;*   Architecture      : TMS320C671x
;*   Optimization      : Enabled at level 3
;*   Optimizing for    : Speed
;*                       Based on options: -o3, no -ms
;*   Endian            : Little
;*   Interrupt Thrshld : Disabled
;*   Data Access Model : Far Aggregate Data
;*   Pipelining        : Enabled
;*   Speculate Loads   : Disabled
;*   Memory Aliases    : Presume not aliases (optimistic)
;*   Debug Info        : Optimized w/Profiling Info

But when I look at the generated asm code I see that the compiler
has stripped the linear assembly code and embedded it inside
the main function. Now it is impossible for me to view the pipelined
assembly code for the function (and loop) as I did before (or as you
posted). This is what the compiler generates:

          CALL    .S1     _my_asm_function       ; |24|
||         STW     .D2T1   A15,*+SP(8256)

          STW     .D2T1   A11,*+SP(8220)
          STW     .D2T1   A10,*+SP(8216)

          STW     .D1T1   A13,*-A9(28)
||         STW     .D2T2   B11,*+SP(8244)
||         MVKL    .S1     _Input_Signal,A3  ; |24|

          STW     .D1T1   A12,*-A9(32)
||         MVKL    .S2     RL0,B3            ; |24|
||         STW     .D2T2   B12,*+SP(8248)
||         MVKH    .S1     _Input_Signal,A3  ; |24|

          MVKH    .S2     RL0,B3            ; |24|
||         STW     .D1T1   A14,*-A9(24)
||         MV      .L2X    A3,B4             ; |24|
||         STW     .D2T2   B13,*+SP(8252)
||         ADD     .L1X    12,SP,A4          ; |24|

RL0:       ; CALL OCCURS {_my_asm_function}       ; |24|
          MVK     .S1     0x1,A7            ; |35|How can I avoid the stripping? Which compiler option controls
that?

Shlomo.

--
-------------------------------
"I've missed more than 9000 shots in my career.
I've lost almost 300 games. 26 times I've been trusted to take the game winning shot and missed.
I've failed over and over again in my life.
And that is why I succeed."
-- Michael Jordan
--------------------------------

Question about optimization

Sign in

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About DSPRelated.com

Social Networks

The Related Media Group