DSPRelated.com
Forums

c64x software pipeline

Started by Anand K August 27, 2003
Hello experts:

I am working on porting some code from 62x to 64x processors
and there are a few things on which I would like your opinion. Most
of these pertain to the SIMD instructions available for c64x.

1.
In a thread of emails regarding the correct set of steps to write
optimized code, the following steps were listed by an expert:

* natural C code: text book implementation
* optimized C code: C code with advanced loop level optimizations and
pragma's
* intrinsic C code: C code with intrinsics
* Serial assembly code: Linear sequence of assembly instructions
* Partitioned Serial assembly code: Code with .1's and .2's to guide
optimizer
* Hand code: If needed.

My question with c64x porting is, are there ways to write natural or
optimized 'C' (with no intrinsics) to make the compiler use packed
instructions ? If there aren't any, is it good to start with
intrinsic C code as the first step to development ? [Assuming the
application performs better using packed instructions]

2.
Found an interesting thing when trying to load unaligned double
words - the iteration interval (ii) somehow was always higher if I
specify the unaligned double word load explicitly, compared to
specifying two unaligned word-loads. Checking the list file (*.lst)
showed that both the listings had an LDNDW !

Related question - one of the compiler feedbacks (in the higher ii
case) is
- "Inserted to break DPG cycle".
Wondering what this might mean ?! I figured DPGa Precedence Graph
from Web, but I dont understand the context, and this one's not
documented in the spru187 (Optimizing C compiler users guide)

3.
Are there any recommendations on the usage of unaligned loads and
aligned loads ?

4.
I am curious to know if there are any general guidelines on using
packed instructions - particularly cases where the compiler used
packed instructions just by looking at the processor specification.
And apps where packed instructions fared worse than unpacked.

Thanks in advance for sharing your views and ideas,

Regards
ka



My few cents on this....
 
Using #pragmas to align the data as well as passing on useful information like minimum trip count, etc of a loop can indeed help the compiler to select packed data processing. Looking at what the compiler gives you back, you can always switch to intrinsics to see if there is any further improvement. In addition, true power of the C64x can be used with data types such as unsigned char, unsigned short, etc.
 
http://cs-tr.cs.rice.edu/Dienst/Repository/2.0/Body/ncstrl.rice_cs/TR02-410/postscript
The paper talks about a scheduling algorithm implementation on the C6200; might give some pointers as to why the DPG cycle was broken....
 
To answer #3 of your questions, look at page 6-40 of the Programmer's Guide - "When to use Non-aligned memory accesses"; its bandwidth versus amount of vectorization you require.
 
Packed instructions are generally used in efficient & optimized implementations of multimedia algorithms. An implementation benefits from such instructions if the underlying algorithm is suited for packed data processing.
 
cheers,
indrajit
 

Anand K <a...@yahoo.com> wrote:
Hello experts:

I am working on porting some code from 62x to 64x processors
and there are a few things on which I would like your opinion. Most
of these pertain to the SIMD instructions available for c64x.

1.
In a thread of emails regarding the correct set of steps to write
optimized code, the following steps were listed by an expert:

* natural C code: text book implementation
* optimized C code: C code with advanced loop level optimizations and
pragma's
* intrinsic C code: C code with intrinsics
* Serial assembly code: Linear sequence of assembly instructions
* Partitioned Serial assembly code: Code with .1's and .2's to guide
optimizer
* Hand code: If needed.

My question with c64x porting is, are there ways to write natural or
optimized 'C' (with no intrinsics) to make the compiler use packed
instructions ? If there aren't any, is it good to start with
intrinsic C code as the first step to development ? [Assuming the
application performs better using packed instructions]

2.
Found an interesting thing when trying to load unaligned double
words - the iteration interval (ii) somehow was always higher if I
specify the unaligned double word load explicitly, compared to
specifying two unaligned word-loads. Checking the list file (*.lst)
showed that both the listings had an LDNDW !

Related question - one of the compiler feedbacks (in the higher ii
case) is
- "Inserted to break DPG cycle".
Wondering what this might mean ?! I figured DPGa Precedence Graph
from Web, but I dont understand the context, and this one's not
documented in the spru187 (Optimizing C compiler users guide)

3.
Are there any recommendations on the usage of unaligned loads and
aligned loads ?

4.
I am curious to know if there are any general guidelines on using
packed instructions - particularly cases where the compiler used
packed instructions just by looking at the processor specification.
And apps where packed instructions fared worse than unpacked.

Thanks in advance for sharing your views and ideas,

Regards
ka_____________________________________
Note: If you do a simple "reply" with your email client, only the author of this message will receive your answer. You need to do a "reply all" if you want your answer to be distributed to the entire group.

_____________________________________
About this discussion group:

To Join: Send an email to c...@yahoogroups.com

To Post: Send an email to c...@yahoogroups.com

To Leave: Send an email to c...@yahoogroups.com

Archives: http://www.yahoogroups.com/group/c6x

Other Groups: http://www.dsprelated.com



ka,
 
I will try to answer #1 and combine it with some of my personal philosophy.
 
If you have the luxury...  After I have designed my code [and possibly performed some prototyping and proof of concept], I get it working in C. I preserve and update the C code to run comparison tests in the event that I need to add a significant amount of asm code [which equals a great opportunity for errors].  I always think that I know 'where the bottle necks' or 'code bloat' is located, but I try to benchmark/profile my code [ideally a debug, max speed and min size build - looks like the CCS 2.20 profiler does this for me] for an objective opinion   So often the 80/20 rule [or some slight variation will hold true - there are obviously exceptions].  By doing this, I can now assess my situation.
 
Is my "problem" code size or speed [it's usually some of each].
 
Observing code generated for a debug and optimized build will give some some idea of what/how the compiler optimizes - although it seems to have the ability to generate code that I do not recognize due to it optimization techniques. 
 
I take the "tallest pole or two" and start addressing it, check my results and continue the process.  If I get into a situation that requires severe optimization, I always consider the product life cycle - how is someone going to test and maintain this code??
 
okay, I am getting off of my soapbox...
 
#2. A couple of useful documents are:
Code Coverage and Multi-event Profiler User's Guide (Rev. A) spru624
Using Code Coverage & Multi-event Profiler for Robustness & Efficiency Analysis spra868
CCS 2.20 has a fairly nice tool for profiling/tuning your code.
 
#3. My preference is to ignore alignment [let the compiler do it] until i have correctly working code.
 
#4. I do not know a general answer to this...  It is my belief [but since I have never been "cramped for memory" in my limited c64x experience, I do not know] that the compiler will use the best instruction sequence that it can - regardless of instruction type [as long as you tell the compile that it is a c64  [:-)
 
good luck,
mikedunn

Anand K <a...@yahoo.com> wrote:
Hello experts:

I am working on porting some code from 62x to 64x processors
and there are a few things on which I would like your opinion. Most
of these pertain to the SIMD instructions available for c64x.

1.
In a thread of emails regarding the correct set of steps to write
optimized code, the following steps were listed by an expert:

* natural C code: text book implementation
* optimized C code: C code with advanced loop level optimizations and
pragma's
* intrinsic C code: C code with intrinsics
* Serial assembly code: Linear sequence of assembly instructions
* Partitioned Serial assembly code: Code with .1's and .2's to guide
optimizer
* Hand code: If needed.

My question with c64x porting is, are there ways to write natural or
optimized 'C' (with no intrinsics) to make the compiler use packed
instructions ? If there aren't any, is it good to start with
intrinsic C code as the first step to development ? [Assuming the
application performs better using packed instructions]

2.
Found an interesting thing when trying to load unaligned double
words - the iteration interval (ii) somehow was always higher if I
specify the unaligned double word load explicitly, compared to
specifying two unaligned word-loads. Checking the list file (*.lst)
showed that both the listings had an LDNDW !

Related question - one of the compiler feedbacks (in the higher ii
case) is
- "Inserted to break DPG cycle".
Wondering what this might mean ?! I figured DPGa Precedence Graph
from Web, but I dont understand the context, and this one's not
documented in the spru187 (Optimizing C compiler users guide)

3.
Are there any recommendations on the usage of unaligned loads and
aligned loads ?

4.
I am curious to know if there are any general guidelines on using
packed instructions - particularly cases where the compiler used
packed instructions just by looking at the processor specification.
And apps where packed instructions fared worse than unpacked.

Thanks in advance for sharing your views and ideas,

Regards
ka_____________________________________
Note: If you do a simple "reply" with your email client, only the author of this message will receive your answer. You need to do a "reply all" if you want your answer to be distributed to the entire group.

_____________________________________
About this discussion group:

To Join: Send an email to c...@yahoogroups.com

To Post: Send an email to c...@yahoogroups.com

To Leave: Send an email to c...@yahoogroups.com

Archives: http://www.yahoogroups.com/group/c6x

Other Groups: http://www.dsprelated.com