DSPRelated.com
Forums

Question about optimization

Started by shlomo_kashani November 9, 2005
Hi,
1-I am implementing a loop in assembly for a code that was originally
written in C. When I use the C code with the -k, (keep generated
assembly files) -mw and -o3 compiler parameter on the compiler
generates the "SOFTWARE PIPELINE INFORMATION". I know the case is the
same when using Linear assembly. However, what happens if I am
writing directly in assembly? why isn't this information available?
How can I pipeline the assembly code without the compilers tips ... ?

2-I wish to reference a variable that was defined in C as:
#define numOfDataPoints 1024

In my assembly code as:

.ref _numOfDataPoints

The compiler generates an error that the symbol is undefined.
Declaring the variable as: const numOfDataPoints24;
solved the issue however it generated a different error since I am
using the value for defining the size of an array and array size must
be a const so the compiler did not allow it. Any ideas ?
Thanks,

Shlomo.



Shlomo,

Can you post the C code here for the group to look at (and comment on) ?

What processor are you using ?

Consider ASM a last resort. You should be able to get the C compiler to generate VERY close to optimal code if you have everything organized correctly. You should be able to get VERY VERY VERY close to optimal with linear assembly.

- Andrew

shlomo_kashani wrote:

>Hi,
>1-I am implementing a loop in assembly for a code that was originally
>written in C. When I use the C code with the -k, (keep generated
>assembly files) -mw and -o3 compiler parameter on the compiler
>generates the "SOFTWARE PIPELINE INFORMATION". I know the case is the
>same when using Linear assembly. However, what happens if I am
>writing directly in assembly? why isn't this information available?
>How can I pipeline the assembly code without the compilers tips ... ?
>
>2-I wish to reference a variable that was defined in C as:
>#define numOfDataPoints 1024
>
>In my assembly code as:
>
>.ref _numOfDataPoints
>
>The compiler generates an error that the symbol is undefined.
>Declaring the variable as: const numOfDataPoints24;
>solved the issue however it generated a different error since I am
>using the value for defining the size of an array and array size must
>be a const so the compiler did not allow it. Any ideas ? >
>Thanks,
>
>Shlomo. >




Dear Andrew and Gregory,
Thanks for the prompt reply.
I have just moved the assembly code to Linear assembly (took me some
time ... ). Generally speaking I am using the 6711 but now I am
trying the simulator as I do not have a parallel port on my laptop
(only USB ...).

Per your request, I am posting the C code, the assembly code, and the
linear assembly code.
Just to make it easier to understand the input is:
y[]={0.220649,
; 0.439838,
; 0.656116,
; 0.868058,
; 1.074272,
; 1.273410,
; 1.464181,
; 1.645360,
; }

and the output should be:

x[]={0.220649,
;0.0
; 0.439838,
; 0.0
; 0.656116,
; 0.0
; 0.868058,
; 0.0
; 1.074272,
; 0.0
; 1.273410,
; 0.0
; 1.464181,
; 0.0
; 1.645360,
; }; Linear assembly:

.global _load_array; declare the function
as global so that it
; can
ber eferenced from C

_load_array .cproc x_addr, y_addr ;

.reg memOffset
.reg FFTPnts
.reg jCnt
.reg zeroValue
.reg x_value
.reg y_value

ZERO memOffset ;offset register,
used for pointer aritmetic
ZERO jCnt ;used for generating
the values of memOffset
ZERO zeroValue ;used for setting odd
terms to Zero

MVK 2048,FFTPnts ;numOfFFTDataPoints 48
.no_mdep ;no
loop: .trip 1024, 1024
;assume between 2048/2 and 2048/2 iterations

SHL jCnt,1,memOffset
;shift left (multiply jCnt by two-->memOffset)

ADD 1,jCnt,jCnt
;jCnt++

LDW *y_addr++,y_value;
;load a 32byte word into y_value, postincreamnet

;the y_addr pointer

STW y_value,*x_addr[memOffset] ;store the
value into x_addr offset by memOffset
ADD 1,memOffset,memOffset ;memOffset++.
memOffset is now an odd term

STW zeroValue,*x_addr[memOffset];store
the Zero value into x_addr offset by memOffset

;which is now an odd term SUB FFTPnts,2,FFTPnts
;loop decreamnet by 2

[FFTPnts] B loop
;continue until all points computed

.endproc

Assembly:
.def _load_array

.text ;text section

;.ref _numOfDataPoints _load_array:
; MVK _numOfDataPoints,A1

MVK 1024,A1
|| ZERO A8
;Offset register, used for pointer aritmetic

ZERO A9 ;(j)
|| ZERO B9

LOOP:

SHL .S1 A9,1,A8 ;shift left
(multiply A9 by two-->A8)

|| ADD .L1 1,A9,A9 ;A9++

LDW .D2 *B4++,B2;
NOP 4
STW .D1 B2,*A4[A8] ;A8 is an even term

NOP 3
ADD .L1 1,A8,A8 ;A8++. A8 is
now an odd term

STW .D1 B9,*A4[A8]
NOP 3

SUB .S1 A1,2,A1 ;
[A1] B .S2 LOOP
NOP 5

B .S2 B3
NOP 5

.end
C:

j=0;
for (i=0;i<2048;i+=2){
data[i]=Input_Signal[j];
j=j+1;
data[i+1]=0;
}

Thanks,

Shlomo. --- In c6x@c6x@..., Andrew Elder <andrew_elder@b...> wrote:
>
> Shlomo,
>
> Can you post the C code here for the group to look at (and comment
on) ?
>
> What processor are you using ?
>
> Consider ASM a last resort. You should be able to get the C
compiler to generate VERY close to optimal code if you have
everything organized correctly. You should be able to get VERY VERY
VERY close to optimal with linear assembly.
>
> - Andrew
>
> shlomo_kashani wrote:
>
> >Hi,
> >1-I am implementing a loop in assembly for a code that was
originally
> >written in C. When I use the C code with the -k, (keep generated
> >assembly files) -mw and -o3 compiler parameter on the compiler
> >generates the "SOFTWARE PIPELINE INFORMATION". I know the case is
the
> >same when using Linear assembly. However, what happens if I am
> >writing directly in assembly? why isn't this information
available?
> >How can I pipeline the assembly code without the compilers
tips ... ?
> >
> >2-I wish to reference a variable that was defined in C as:
> >#define numOfDataPoints 1024
> >
> >In my assembly code as:
> >
> >.ref _numOfDataPoints
> >
> >The compiler generates an error that the symbol is undefined.
> >Declaring the variable as: const numOfDataPoints24;
> >solved the issue however it generated a different error since I am
> >using the value for defining the size of an array and array size
must
> >be a const so the compiler did not allow it. Any ideas ?
> >
> >
> >
> >Thanks,
> >
> >Shlomo.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>





Shlomo,

This

void zeropad(float *Input_Signal, float *data)
{
    int i;
    for (i=0;i<2048;i+=2){
        data[i]=Input_Signal[i/2];
        data[i+1]=0;
    }
}

gives

L11:    ; PIPED LOOP PROLOG
;** --*
L12:    ; PIPED LOOP KERNEL
   [!A1]   LDW     .D1T1   *+A6[A5],A5       ;  ^ |238|
           NOP             4

   [ B0]   SUB     .L2     B0,1,B0           ; |240|
||         SHR     .S2     B6,2,B8           ; |238|
|| [!A1]   STW     .D2T2   B4,*++B7(16)      ; |239|
|| [!A1]   STW     .D1T1   A5,*++A3(16)      ;  ^ |238|

   [!A1]   STW     .D1T1   A4,*+A3(12)       ; |239|
|| [ B0]   B       .S1     L12               ; |240|
|| [!A1]   LDW     .D2T2   *+B5[B8],B8       ;  ^ |238|

           NOP             3
           ADD     .D1     8,A0,A0           ; |240|

   [ A1]   SUB     .D1     A1,1,A1           ;
||         ADD     .S2     8,B6,B6           ; |240|
|| [!A1]   STW     .D2T2   B8,*+B7(4)        ;  ^ |238|
||         SHR     .S1     A0,2,A5           ; @|238|

;** --*
L13:    ; PIPED LOOP EPILOG

This
void zeropad(const float *Input_Signal, float *data)
{
    int i;
    for (i=0;i<2048;i+=2){
       data[i]=*Input_Signal++;
       data[i+1]=0;
    }
}

gives

;** --*
L12:    ; PIPED LOOP KERNEL

           LDW     .D2T2   *++B5(8),B7       ; @|241|
|| [ B0]   B       .S1     L12               ; @|243|
||         LDW     .D1T1   *++A4(8),A5       ; @@|241|

   [!A1]   STW     .D2T2   B4,*++B6(16)      ; |242|
|| [!A1]   STW     .D1T1   A5,*++A0(16)      ; |241|

   [ A1]   SUB     .S1     A1,1,A1           ;
|| [!A1]   STW     .D2T2   B7,*+B6(4)        ; |241|
|| [!A1]   STW     .D1T1   A3,*+A0(12)       ; |242|
|| [ B0]   SUB     .S2     B0,1,B0           ; @@|243|

;** --*
L13:    ; PIPED LOOP EPILOG

This is 3 clocks to write 4 values in the data[] array. Not too bad in my opinion and you won't do any better using assembly language or linear assembly.

- Andrew E. shlomo_kashani wrote:
Dear Andrew and Gregory,
Thanks for the prompt reply. I have just moved the assembly code to Linear
assembly (took me some time ... ). Generally speaking I am using the 6711 but
now I am trying the simulator as I do not have a parallel port on my laptop
(only USB ...). Per your request, I am posting the C code, the assembly code,
and the linear assembly code. Just to make it easier to understand the input is:
y[]={0.220649,
;	0.439838,
;	0.656116,
;	0.868058,
;	1.074272,
;	1.273410,
;	1.464181,
;	1.645360,
;	}
and the output should be:
x[]={0.220649,
;0.0
;	0.439838,
;	0.0
;	0.656116,
;	0.0
;	0.868058,
;	0.0
;	1.074272,
;	0.0
;	1.273410,
;	0.0
;	1.464181,
;	0.0
;	1.645360,
;	};
Linear assembly:
.global _load_array;	declare the function as global so that it ;	can ber
eferenced from C
_load_array .cproc x_addr, y_addr	;
.reg	memOffset
.reg	FFTPnts
.reg	jCnt
.reg	zeroValue	.reg	x_value
.reg	y_value
ZERO memOffset	;offset register, used for pointer aritmetic ZERO jCnt	;used for
generating the values of memOffset	ZERO zeroValue	;used for setting odd terms to
Zero
MVK 2048,FFTPnts	;numOfFFTDataPoints 48 .no_mdep	;no loop: .trip 1024, 1024
;assume between 2048/2 and 2048/2 iterations
SHL	jCnt,1,memOffset	;shift left (multiply jCnt by two-->memOffset)	ADD
1,jCnt,jCnt ;jCnt++
LDW	*y_addr++,y_value;	;load a 32byte word into y_value, postincreamnet ;the
y_addr pointer	STW y_value,*x_addr[memOffset]	;store the value into x_addr
offset by memOffset ADD 1,memOffset,memOffset	;memOffset++. memOffset is now an
odd term	STW	zeroValue,*x_addr[memOffset];store the Zero value into x_addr
offset by memOffset
;which is now an odd term
SUB	FFTPnts,2,FFTPnts ;loop decreamnet by 2
[FFTPnts]	B loop	;continue until all points computed .endproc
Assembly:
.def _load_array .text ;text section	;.ref _numOfDataPoints _load_array: ;	MVK
_numOfDataPoints,A1 MVK 1024,A1 ||	ZERO A8	;Offset register, used for pointer
aritmetic ZERO A9 ;(j)
||	ZERO B9 LOOP: SHL .S1 A9,1,A8	;shift left (multiply A9 by two-->A8)	||	ADD
.L1 1,A9,A9 ;A9++
LDW .D2	*B4++,B2;
NOP	4	STW .D1 B2,*A4[A8]	;A8 is an even term
NOP	3	ADD .L1 1,A8,A8	;A8++. A8 is now an odd term	STW .D1 B9,*A4[A8]
NOP	3	SUB .S1 A1,2,A1 ;
[A1]	B .S2	LOOP NOP	5	B	.S2	B3 NOP 5 .end C:
j=0;
for (i=0;i<2048;i+=2){
data[i]=Input_Signal[j];
j=j+1;
data[i+1]=0;
}
Thanks,
Shlomo. --- In c...@yahoogroups.com, Andrew Elder <andrew_elder@b...>
wrote:
Shlomo,
Can you post the C code here for the group to look at (and comment 
on) ?
What processor are you using ?
Consider ASM a last resort. You should be able to get the C 
compiler to generate VERY close to optimal code if you have
everything organized correctly. You should be able to get VERY VERY VERY close
to optimal with linear assembly.
- Andrew
shlomo_kashani wrote:
Hi,
1-I am implementing a loop in assembly for a code that was 
originally 
written in C. When I use the C code with the -k, (keep generated
assembly files) -mw and -o3 compiler parameter on the compiler generates the
"SOFTWARE PIPELINE INFORMATION". I know the case is 
the 
same when using Linear assembly. However, what happens if I am
writing directly in assembly? why isn't this information 
available? 
How can I pipeline the assembly code without the compilers
tips ... ?
2-I wish to reference a variable that was defined in C as:
#define numOfDataPoints 1024
In my assembly code as: .ref _numOfDataPoints
The compiler generates an error that the symbol is undefined. Declaring the
variable as: const numOfDataPoints24; solved the issue however it generated a
different error since I am using the value for defining the size of an array and
array size 
must 
be a const so the compiler did not allow it. Any ideas ?
Thanks,
Shlomo.


	



Andrew,
Thanks for using -k for me :)

The *Input_Signal++ line is exactly what I am using in linear
assembly as LDW *y_addr++,y_value;.

The point here is to write in assembly or linear assembly to get
familiarized with its usage. I am still interested in compiler
generated optimized code from the linear assembly code (rather then
optimized code based on C code).

I am using these compiler options:

;* Architecture : TMS320C671x
;* Optimization : Enabled at level 3
;* Optimizing for : Speed
;* Based on options: -o3, no -ms
;* Endian : Little
;* Interrupt Thrshld : Disabled
;* Data Access Model : Far Aggregate Data
;* Pipelining : Enabled
;* Speculate Loads : Disabled
;* Memory Aliases : Presume not aliases (optimistic)
;* Debug Info : Optimized w/Profiling Info

But when I look at the generated asm code I see that the compiler
has stripped the linear assembly code and embedded it inside
the main function. Now it is impossible for me to view the pipelined
assembly code for the function (and loop) as I did before (or as you
posted). This is what the compiler generates:

CALL .S1 _my_asm_function ; |24|
|| STW .D2T1 A15,*+SP(8256)

STW .D2T1 A11,*+SP(8220)
STW .D2T1 A10,*+SP(8216)

STW .D1T1 A13,*-A9(28)
|| STW .D2T2 B11,*+SP(8244)
|| MVKL .S1 _Input_Signal,A3 ; |24|

STW .D1T1 A12,*-A9(32)
|| MVKL .S2 RL0,B3 ; |24|
|| STW .D2T2 B12,*+SP(8248)
|| MVKH .S1 _Input_Signal,A3 ; |24|

MVKH .S2 RL0,B3 ; |24|
|| STW .D1T1 A14,*-A9(24)
|| MV .L2X A3,B4 ; |24|
|| STW .D2T2 B13,*+SP(8252)
|| ADD .L1X 12,SP,A4 ; |24|

RL0: ; CALL OCCURS {_my_asm_function} ; |24|
MVK .S1 0x1,A7 ; |35| How can I avoid the stripping? Which compiler option controls
that?

Shlomo.


Hi,
This is regarding ur question 2.

> 2-I wish to reference a variable that was defined in C as:
> #define numOfDataPoints 1024
^^^^^^^^
Let me just explain the effect of "#define". This is actually not
defining a variable, but is generating a macro. So the variable is never
linked, but textually replaced at all points of occurance.

>
> In my assembly code as:
>
> .ref _numOfDataPoints

Now u r trying to refer to a variable which doesn't exist (because
before linker, the preprocessor has replaced all occurances of
numOfDataPoints with 1024). > The compiler generates an error that the symbol is undefined.
Obvious!!

> Declaring the variable as: const numOfDataPoints24;
Now u r declaring a variable; with #define u hadn't. (BTW it is better
to mention the data type also, like "const int numOfDataPoints = 1024")

> solved the issue however it generated a different error since I am
> using the value for defining the size of an array and array size must
> be a const so the compiler did not allow it. Any ideas ?

Just use "int numOfDataPoints24;" HTH
--
jag
"Quaerendo invenietis"

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s)and may contain confidential or privileged information. If you are not the intended recipient, please notify the sender or administrator@admi...


Shlomo--
You cannot control assembly code using compiler options. When you are doing assembly coding you are literally scheduling the 16 stage C67xx DSP pipeline and you dont get any code generation tools to help you at assembly.All that help stops with linear assembly. What you write is what you get with hand coded assembly.
 
--Bhooshan

 
On 11/10/05, shlomo_kashani <s...@yahoo.com> wrote:
Andrew,
Thanks for using -k for me :)

The  *Input_Signal++ line is exactly what I am using in linear
assembly as LDW *y_addr++,y_value;.

The point here is to write in assembly or linear assembly to get
familiarized with its usage. I am still interested in compiler
generated optimized code from the linear assembly code (rather then
optimized code based on C code).

I am using these compiler options:

;*   Architecture      : TMS320C671x
;*   Optimization      : Enabled at level 3
;*   Optimizing for    : Speed
;*                       Based on options: -o3, no -ms
;*   Endian            : Little
;*   Interrupt Thrshld : Disabled
;*   Data Access Model : Far Aggregate Data
;*   Pipelining        : Enabled
;*   Speculate Loads   : Disabled
;*   Memory Aliases    : Presume not aliases (optimistic)
;*   Debug Info        : Optimized w/Profiling Info

But when I look at the generated asm code I see that the compiler
has stripped the linear assembly code and embedded it inside
the main function. Now it is impossible for me to view the pipelined
assembly code for the function (and loop) as I did before (or as you
posted). This is what the compiler generates:

          CALL    .S1     _my_asm_function       ; |24|
||         STW     .D2T1   A15,*+SP(8256)

          STW     .D2T1   A11,*+SP(8220)
          STW     .D2T1   A10,*+SP(8216)

          STW     .D1T1   A13,*-A9(28)
||         STW     .D2T2   B11,*+SP(8244)
||         MVKL    .S1     _Input_Signal,A3  ; |24|

          STW     .D1T1   A12,*-A9(32)
||         MVKL    .S2     RL0,B3            ; |24|
||         STW     .D2T2   B12,*+SP(8248)
||         MVKH    .S1     _Input_Signal,A3  ; |24|

          MVKH    .S2     RL0,B3            ; |24|
||         STW     .D1T1   A14,*-A9(24)
||         MV      .L2X    A3,B4             ; |24|
||         STW     .D2T2   B13,*+SP(8252)
||         ADD     .L1X    12,SP,A4          ; |24|

RL0:       ; CALL OCCURS {_my_asm_function}       ; |24|
          MVK     .S1     0x1,A7            ; |35|How can I avoid the stripping? Which compiler option controls
that?

Shlomo.



--
-------------------------------
"I've missed more than 9000 shots in my career.
I've lost almost 300 games. 26 times I've been trusted to take the game winning shot and missed.
I've failed over and over again in my life.
And that is why I succeed."
        -- Michael Jordan
--------------------------------