DSPRelated.com
Forums

Question about algorithm-to-architecture mapping

Started by John April 16, 2006
Hi,

I was wondering about this:

I have programmed a C-function in VisualDSP++ and the function is
going to be executed on the SHARC ADSP21364 EZKIT evaluation
board.

I want to know what actually happens on a hardware architectural level
when the function executes.

For the sake of simplicity, let's say I have this C-program:

----------------------
float a,b,c;

void main(void)
{
   a=2.3;
   b=5.7;
   c=a+b;
}
-----------------------

Then I would like to know the series of events
that takes place in the actual hardware.

From looking at the C-program I guess the program
uses 3x32bit of data memory. The program itself
probably consists of the following instructions.

1) Set 32-bit cell to 2.3
2) Set 32-bit cell to 5.7
3) Move 2.3 to input 1 of ALU
4) Move 5.7 to input 2 of ALU
5) Ask ALU to add the 2 numbers together
6) Place result in 32-bit cell

If the number of instructions is X and the bit-width
of each instruction is Y then I guess the code takes
up X*Y bits of program space.

I am by no means an expert. So the best way to see what actually happens
is probably to take a look at the assembly version of
the C-function? I assume that C-functions are translated
into an equivalent assembly function, right?

This leads me to the question I wanna ask:

It would be awesome to have a tool where you specify which
platform you want to execute your function on and then you
load the function into the tool. The tool then generates an
equivalent architectural representation of the function. The tool
also demonstrates which hardware elements are used
during execution of the function, how/why they are used and
when they are used.

Having such a representation would make it easier to figure out
if the function executes in an optimal way on the given architecture
and it would also give you a hint as to where there is room for
optimization/improvement.

Am I making sense? What are your comments?

I look forward to be enlightened by the experts :o)



John wrote:

> Hi, > > I was wondering about this: > > I have programmed a C-function in VisualDSP++ and the function is > going to be executed on the SHARC ADSP21364 EZKIT evaluation > board. > > I want to know what actually happens on a hardware architectural level > when the function executes. > > For the sake of simplicity, let's say I have this C-program: > > ---------------------- > float a,b,c; > > void main(void) > { > a=2.3; > b=5.7; > c=a+b; > } > ----------------------- > > Then I would like to know the series of events > that takes place in the actual hardware. > > From looking at the C-program I guess the program > uses 3x32bit of data memory. The program itself > probably consists of the following instructions. > > 1) Set 32-bit cell to 2.3 > 2) Set 32-bit cell to 5.7 > 3) Move 2.3 to input 1 of ALU > 4) Move 5.7 to input 2 of ALU > 5) Ask ALU to add the 2 numbers together > 6) Place result in 32-bit cell > > If the number of instructions is X and the bit-width > of each instruction is Y then I guess the code takes > up X*Y bits of program space. > > I am by no means an expert. So the best way to see what actually happens > is probably to take a look at the assembly version of > the C-function? I assume that C-functions are translated > into an equivalent assembly function, right?
I think you need to study the basics of computer architecture. Fortunately, "computer architecture" is exactly the set of keywords you want to look for when you get yourself a book on the subject. Once you understand the basics of computer architecture the best way to do this is to look at the assembly, yes. I think, however, you'll find that while your understanding is correct in principal there are a lot of details that are missing, and these details vary greatly from one processor to the next.
> > This leads me to the question I wanna ask: > > It would be awesome to have a tool where you specify which > platform you want to execute your function on and then you > load the function into the tool. The tool then generates an > equivalent architectural representation of the function. The tool > also demonstrates which hardware elements are used > during execution of the function, how/why they are used and > when they are used. > > Having such a representation would make it easier to figure out > if the function executes in an optimal way on the given architecture > and it would also give you a hint as to where there is room for > optimization/improvement. > > Am I making sense? What are your comments? > > I look forward to be enlightened by the experts :o) >
Yes you are making sense, but I doubt that I'm going to satisfy your craving. The part that executes functions for a give architecture is called a 'simulator', although it's often easier, and always more accurate, to get the real chip on an eval board to do testing with. The efficiency of the compiled code varies greatly with the compiler used (this is, in fact a big selling point for high $$ compilers), so there is no one tool -- for each given architecture you need at least one compiler. Basically you'll end up juggling a ton of different tool chains if you want to support multiple processors. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com Posting from Google? See http://cfaj.freeshell.org/google/
John wrote:

> I have programmed a C-function in VisualDSP++ and the function is > going to be executed on the SHARC ADSP21364 EZKIT evaluation > board. > > I want to know what actually happens on a hardware architectural level > when the function executes.
You can just run the program in the VisualDSP simulator - you can look at the whole computer (DSP), all its data and status registers, all memory, everything, as the program sequences. Try the disassembly and memory windws for starters. ...
> It would be awesome to have a tool where you specify which > platform you want to execute your function on and then you > load the function into the tool. The tool then generates an > equivalent architectural representation of the function. The tool > also demonstrates which hardware elements are used > during execution of the function, how/why they are used and > when they are used.
VisualDSP is such a tool. It does all you want for all the DSPs in the Analog Devices product range. Looks like you want to read up some of the documentation supplied with VisualDSP. Regards, Andor
John skrev:
> Hi, > > I was wondering about this: > > I have programmed a C-function in VisualDSP++ and the function is > going to be executed on the SHARC ADSP21364 EZKIT evaluation > board. > > I want to know what actually happens on a hardware architectural level > when the function executes.
...
> It would be awesome to have a tool where you specify which > platform you want to execute your function on and then you > load the function into the tool. The tool then generates an > equivalent architectural representation of the function. The tool > also demonstrates which hardware elements are used > during execution of the function, how/why they are used and > when they are used. > > Having such a representation would make it easier to figure out > if the function executes in an optimal way on the given architecture > and it would also give you a hint as to where there is room for > optimization/improvement. > > Am I making sense? What are your comments? > > I look forward to be enlightened by the experts :o)
I'm no expert on this. The only ASM programming I ever did, was on the ix86 processors 15 years ago. Given that caveat, what you ask for sounds like a cross-platform simulator/ cross-compiler, and yes, from a user's perspective it makes a lot of sense. I just saw some PR material for Intel's C++ compilers. The claim was that Intel's compilers produced executables that were some 20-30% faster than did other compilers. It seems like that increased speed is gained by the intimiate knowledge of the architecture etc on the Intel processor that is available to the SW people at Intel and not to others. While certainly useful, a cross-platform system of this kind isn't likely ever to be realized. Part of the problem is that no one would ever (voluntarely) take on the task of coordinating the information needed to achieve that. It's an Herculean task. And no one would ever (voluntarely) give up in-house knowlegde about the weakneses and strengths of their processors. That's the sort of things that are kept in-house for business purposes. Rune
> > VisualDSP is such a tool. It does all you want for all the DSPs in the > Analog Devices product range. >
The statistical profiler is more or less useless in my opinion. It tells you how much time is spent in various code sections. Nice info to get started with but I could figure that out myself by timing code sections myself. The only thing I see coming close to what I want is that I am able to see the assembly code of the compiled C-function. But then again....I can see each instruction in a window in VisualDSP, but the manuals I have looked in does not tell me what exactly happens on an architectural level when a particular instruction is executed. And that's what I want to know.... I have also downloaded all the available documentation for the board I am using. Usually you would find a reference guide for the instruction set. But I haven't been able to locate a simple thing like that....Anybody got a clue where I can find that? Back in the simple 80s when I was doing assembly coding on a Commodore 64 I had a reference guide where I could look up instructions and see how many cycles an instruction took to execute thus enabling me to tweak my code. I hope this clarifies my problem....sorry if I didn't express my problem clearly enough :o)
"John" <john@jnho.hnjo.invalid.com> wrote in message 
news:e1uef6$1s9n$1@newsbin.cybercity.dk...
> >> >> VisualDSP is such a tool. It does all you want for all the DSPs in the >> Analog Devices product range. >> > > The statistical profiler is more or less useless in my opinion. It tells you > how much time is spent in various code sections. Nice info to get started with > but I could figure that out myself by timing code sections myself. The only > thing > I see coming close to what I want is that I am able to see the assembly > code of the compiled C-function. But then again....I can see each instruction > in > a window in VisualDSP, but the manuals I have looked in does not tell > me what exactly happens on an architectural level when a particular > instruction is executed. And that's what I want to know....
In the instruction set reference, each individual assembly instruction is described in some detail. There is also some discussion of the op-codes and fields of the 48-bit instructions (e.g. what bits specify the source/destination registers, etc.). Is that what you want?
> I have also downloaded all the available documentation for the > board I am using. Usually you would find a reference guide for > the instruction set. But I haven't been able to locate a simple > thing like that....Anybody got a clue where I can find that? > > Back in the simple 80s when I was doing assembly coding > on a Commodore 64 I had a reference guide where I could look up > instructions and see how many cycles an instruction took to > execute thus enabling me to tweak my code.
OK, that triggered something for me. I think you are looking for the microcode behind each instruction? For one thing, keep in mind that _every_ instruction on the SHARC DSPs typically happens in exactly one clock cycle. I say 'typically' because there are some things that can cause execution 'stalls', such as accessing slow memory, conflicts for a data bus, certain previous instructions, cache misses, etc. (I guess one exception would be branching instructions which normally take 3 cycles, but even for these, you can make them single-cycle with a delayed branch if you want.) But these stalls are not dependent on the instruction itself, but rather on the conditions of the chip while executing the instructions, i.e. previous instructions, state of the cache, DMAs happening at the same time. So you wouldn't say that a particular assembly instruction takes longer than others, though in certain contexts they may. All the various stalls are documented. So perhaps the reason you can't find what you are looking for is that there aren't any multi-cycle microcoded instructions. BTW, the easiest way I have found to optimize my code is to time it with the simulator's cycle counter. That will accurately tell you if stalls are occurring. If you step through the code and see it increment by more than one for a single assembly instruction, you've found a stall. Then you have to figure out why and fix them!
"Jon Harris" <jon99_harris7@hotmail.com> wrote in
news:AxE0g.2380$e55.2268@trnddc02: 

> "John" <john@jnho.hnjo.invalid.com> wrote in message > news:e1uef6$1s9n$1@newsbin.cybercity.dk... >> >>> >>> VisualDSP is such a tool. It does all you want for all the DSPs in >>> the Analog Devices product range. >>> >> >> The statistical profiler is more or less useless in my opinion. It >> tells you how much time is spent in various code sections. Nice info >> to get started with but I could figure that out myself by timing code >> sections myself. The only thing >> I see coming close to what I want is that I am able to see the >> assembly code of the compiled C-function. But then again....I can see >> each instruction in >> a window in VisualDSP, but the manuals I have looked in does not tell >> me what exactly happens on an architectural level when a particular >> instruction is executed. And that's what I want to know.... > > In the instruction set reference, each individual assembly instruction > is described in some detail. There is also some discussion of the > op-codes and fields of the 48-bit instructions (e.g. what bits specify > the source/destination registers, etc.). Is that what you want? > >> I have also downloaded all the available documentation for the >> board I am using. Usually you would find a reference guide for >> the instruction set. But I haven't been able to locate a simple >> thing like that....Anybody got a clue where I can find that?
The basic SHARC instruction manual is the ADSP-21160 Instruction Set Reference. It should have been renamed when the newer processors came out. http://www.analog.com/processors/epManualsDisplay/0,2795,,00.html? SectionWeblawId=433&ContentID=95537&Language=English
>> >> Back in the simple 80s when I was doing assembly coding >> on a Commodore 64 I had a reference guide where I could look up >> instructions and see how many cycles an instruction took to >> execute thus enabling me to tweak my code. > > OK, that triggered something for me. I think you are looking for the > microcode behind each instruction? For one thing, keep in mind that > _every_ instruction on the SHARC DSPs typically happens in exactly one > clock cycle. I say 'typically' because there are some things that can > cause execution 'stalls', such as accessing slow memory, conflicts for > a data bus, certain previous instructions, cache misses, etc. (I > guess one exception would be branching instructions which normally > take 3 cycles, but even for these, you can make them single-cycle with > a delayed branch if you want.) But these stalls are not dependent on > the instruction itself, but rather on the conditions of the chip while > executing the instructions, i.e. previous instructions, state of the > cache, DMAs happening at the same time. So you wouldn't say that a > particular assembly instruction takes longer than others, though in > certain contexts they may. All the various stalls are documented. > > So perhaps the reason you can't find what you are looking for is that > there aren't any multi-cycle microcoded instructions. > > BTW, the easiest way I have found to optimize my code is to time it > with the simulator's cycle counter. That will accurately tell you if > stalls are occurring. If you step through the code and see it > increment by more than one for a single assembly instruction, you've > found a stall. Then you have to figure out why and fix them! > > >
-- Al Clark Danville Signal Processing, Inc. -------------------------------------------------------------------- Purveyors of Fine DSP Hardware and other Cool Stuff Available at http://www.danvillesignal.com
Hi all,

I am no expert and perhaps to their detriment, I agree with the
previous posters.

Compile the code - grab the code snippet of interest - simulate it in
VDSP - and watch the data move from memory to the ALU and back again.
Time was you might not even recognize the snippet after the compiler
got done with it but those times have change.  The environments now
will even tell you where it is.  Whippersnapper... back in my day...
:-)

The bigger issue in my view is:  What happens when the snippet is not
so straightforward (a+b=c) and you have a latitude with respect to
processor choice?  As soon as your algorithm starts to rely on internal
peripherals such as barrel shifters, specific MAC behavior, or certain
DAG features, or "less internal" features such as the IO unit, you
start to see bigger differences between DSP manufacturers and even
processor families within a manufacturer.  When I am decomposing an
algorithm, I will break it up based on what my processor does well, but
I need to know a bit about the architecture of a number of processor in
order to be assured that my platform is the right guy for the job.

I don't think that a tool such as that proposed by the original poster
is possible - mostly for commercial reasons.  But the proposal shines a
light on the age-old problem of algorithm decomposition and the optimal
matching of the functional blocks in a decomposed algorithm to the
functional units and their behavior within a processor or field of
processors.  Maybe I have sat in one too many technical presentations
where a DSP was used to implement a fuzzy logic motor speed controller.
 One is one too many, BTW.

BR,
Dan

> The basic SHARC instruction manual is the ADSP-21160 Instruction Set > Reference. It should have been renamed when the newer processors came > out. >
Thank you very much. Is this reference complete/accurate in the sense that it was written for ADSP-21160 but I am going to use it for ADSP-21364 ?
> > The basic SHARC instruction manual is the ADSP-21160 Instruction Set > Reference. It should have been renamed when the newer processors came > out. > > > http://www.analog.com/processors/epManualsDisplay/0,2795,,00.html? > SectionWeblawId=433&ContentID=95537&Language=English
Hi again... The syntax of some of the instructions looks pretty weird. Sorry for asking a stupid question, but is that actually how you write the instruction? If not, how do I interpret the syntax? What are those vertical bars? I read the beginning of the document and there is a little tiny section regarding syntax and conventions..