DSPRelated.com
Forums

Improve automatically generated code

Started by Tim Frink March 15, 2009
I'm looking for possibilities to improve the performance 
(i.e. program run time) of my C code. The code was automatically 
generated from MathLab models by TargetLink and has a simple 
structure which you can find through all files in the project:

int a;
int b;

void f1(void)
{
   int _x = a;
   int _y = b;

   int _z = y & 8;
   
   if( _x > _z )
    _x = _y | 3;
   else
    _x = 0xff;

   a = _x;
   b = _y;
}

Basically each file contains couple of global variables which
are used across different files. Within each file you mainly find
functions without a return value and parameter, i.e. void f1(void).
In the functions, new local variables are defined and initialized
with the global values. Within the functions you typically have
multiple if-then-else statements which check values with <, >, == 
and != against other values and assign the variables new values, 
often using bit-wise operators like |, & or <<. Finally, before 
leaving the function, the values of the local values are written 
back to the global values. There are some global arrays and pointer 
are rarely used. There are also no loops.

I'm currently compiling the code with GCC and the highest optimization
level (-O3). However, I'm not sure if the compiler will automatically
optimize this type of code well or if I could help the compiler in some
ways to even get out more performance. 

Thus, do you have any suggestions how the code could be further optimized?
Should additional pragmas be used? And which compiler optimizations will
in your opinion help to improve this type of code? Should some
optimizations be preferred (e.g. by changing default optimization
parameters to allow a particular optimization a more aggressive
transformation) or should some compiler optimizations be disabled since
they possibly degrade performance?

Best,
Tim
Tim Frink schrieb:
> I'm looking for possibilities to improve the performance > (i.e. program run time) of my C code. The code was automatically > generated from MathLab models.
> Basically each file contains couple of global variables which > are used across different files.
Global variables disable several optimizations. Separate files also (can) disable some cross-function optimizations.
> Within each file you mainly find > functions without a return value and parameter, i.e. void f1(void).
Who calls these functions? Do functions call other functions? Hendrik vdH
On 15 Mar, 14:23, Tim Frink <plfr...@yahoo.de> wrote:
> I'm looking for possibilities to improve the performance > (i.e. program run time) of my C code. The code was automatically > generated from MathLab models by TargetLink and has a simple > structure which you can find through all files in the project:
...
> Basically each file contains couple of global variables which > are used across different files.
A good optimizer (and as I understand, GCC is) ought to be able to squeeze almost max performance out of this type of program. The fact that you use global variables seems a bit dodgy, though. This means that the program almost certainly uses far memory access in and out of these functions, which in turn means that there is lots of time wasted, compared to local memory access. If the posted function is representative for the function in your project, interfacing across compilation units is likely to be a significant factor, what run-time is concerned. The ideal case would be to have all the functions in one source file (as well as using proper argument passing in and out of functions), thus ensuring fast local memory access. Don't know how to check this. Maybe profilers are good enough these days to map these differences, maybe not. Rune
On Mar 15, 8:23&#4294967295;am, Tim Frink <plfr...@yahoo.de> wrote:
> Thus, do you have any suggestions how the code could be further optimized? > Should additional pragmas be used? And which compiler optimizations will > in your opinion help to improve this type of code? Should some > optimizations be preferred (e.g. by changing default optimization > parameters to allow a particular optimization a more aggressive > transformation) or should some compiler optimizations be disabled since > they possibly degrade performance? >
You will get the fastest, portable, compiler-optimizable code by using macros. Second to that, you could declare all functions 'inline', and good compilers will avoid the overhead of an actual function call as best as possible, and should also toss the prologue/epilogue code, which, for small functions, are performance killers. Adding the 'register' keyword helped about 20 years ago, but might not anymore. As others have pointed out, aside from speed, the model of global variables is a bit weird. ;) If you can figure out a way to just make every function taking values and return a value, that would be fine. Then make them macros and store them in a .h file, and you will get the fastest possible code from compiler. If speed is really an issue, and you have total control over code generation, there is always inline _asm, which would be trivial for these types functions. The speed improvement would be dramatic. On x86 systems, you can generate _asm code "in the raw", without an assembler at all using _declspec(naked). Seems like the GCC equivalent, __attribute__((naked), is not available for x86. So to summarize, the fastest code that you can get would be to make all functions macros, thereby eliminating the possibility of prologue/ epilogue code, where body of function consists of a mix of _asm statements and C variables. I do this in some of my Big Integer operations. -Le Chaud Lapin-

Le Chaud Lapin wrote:

> So to summarize, the fastest code that you can get would be to make > all functions macros, thereby eliminating the possibility of prologue/ > epilogue code, where body of function consists of a mix of _asm > statements and C variables. I do this in some of my Big Integer > operations.
It depends. Declaring functions as inlines and macros is kinda same thing as loop unrolling. It eliminates the housekeeping code but adds the overhead for code translation, cache loading and jump prediction. BTW, sometimes it would be good to have the opposite to "inline" modifier to force the limiter NOT to inline a particular function automatically. For example, I have some fast and small functions placed in L1 but the compiler tries to inline them so the code goes to L2. The modifier could be called ~inline foo() or volatile foo(), but there is no such thing as far as I know. Mixing C with inline asm is awful practice combining disadvantages of both C and asm. If there is a need to code in assembly, do this as the C callable assembly module. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
Vladimir Vassilevsky wrote:
> > BTW, sometimes it would be good to have the opposite to "inline" > modifier to force the limiter NOT to inline a particular function > automatically. For example, I have some fast and small functions placed > in L1 but the compiler tries to inline them so the code goes to L2. The > modifier could be called ~inline foo() or volatile foo(), but there is > no such thing as far as I know.
Not an ideal solution, but if you define the function in a separate C file, it cannot be inlined... -- Oli
On Mar 15, 10:59&#4294967295;am, Vladimir Vassilevsky <antispam_bo...@hotmail.com>
wrote:
> Le Chaud Lapin wrote: > > Mixing C with inline asm is awful practice combining disadvantages of > both C and asm. &#4294967295;If there is a need to code in assembly, do this as the > C callable assembly module.
Not always. There are some cases where the overhead of a function call is simply intolerable. Every serious developer of Big Ingteger library I know use essentially the same technique on x86 machine. Mostly C/C++, with macros for critical operations. Soemthing like: #define MULTIPLY(MULTIPLICAND, MULTIPLIER, PRODUCT_UPPER_WORD, PRODUCT_LOWER_WORD)\ {\ __asm\ {\ __asm mov eax, MULTIPLICAND\ __asm mul MULTIPLIER\ __asm mov PRODUCT_UPPER_WORD, edx\ __asm mov PRODUCT_LOWER_WORD, eax\ }\ } There is no faster way to do it other than straight assembly everwhere, and the difference in speed for cryptographic operations is too great to ignore. A function call would ruin it. _asm has its place. -Le Chaud Lapin-
Tim Frink wrote:

> I'm currently compiling the code with GCC and the highest > optimization level (-O3). However, I'm not sure if the compiler > will automatically optimize this type of code well or if I could > help the compiler in some ways to even get out more performance. > > Thus, do you have any suggestions how the code could be further > optimized? Should additional pragmas be used? And which compiler > optimizations will in your opinion help to improve this type of > code?
None of those questions can be reliably answered from a distance, much less without even knowing the target processor. Use a profiler to identify the real hotspots, then look at their assembly (option -S) to compare optimization outcomes. From the looks of your representative function, you might well be wasting your time worrying. And if, as others suggested, module communication turns out to be a big factor, you're much better off using a whole-program optimizing compiler than modifying generated code. It seems GCC is among those now, but I can't speak to the results. Martin -- Quidquid latine scriptum est, altum videtur.

Weak example. This is how it is done in C:


union
{
struct fubar_32
 {
 u32 fubar_lo;
 u32 fubar_hi;
 }
u64 fubar_64;
} fubar;

u32 fu, bar;


fubar.fubar_64 = fu * (u64)bar;


Any sensible compiler will produce essentially the same code as the inline
assembly macro below, without clobbering the registers and unnecessary
load/store operations.


Vladimir Vassilevsky
DSP and Mixed Signal Consultant
www.abvolt.com






On Mar 15, 10:59 am, Vladimir Vassilevsky <antispam_bo...@hotmail.com>
wrote:
> >> Mixing C with inline asm is awful practice combining disadvantages of >> both C and asm. If there is a need to code in assembly, do this as the >> C callable assembly module.
>Not always.
>There are some cases where the overhead of a function call is simply >intolerable. Every serious developer of Big Ingteger library I know >use essentially the same technique on x86 machine. Mostly C/C++, with >macros for critical operations. Soemthing like:
>#define MULTIPLY(MULTIPLICAND, MULTIPLIER, PRODUCT_UPPER_WORD, >PRODUCT_LOWER_WORD)\ >{\ >__asm\ >{\ >__asm mov eax, MULTIPLICAND\ >__asm mul MULTIPLIER\ >__asm mov PRODUCT_UPPER_WORD, edx\ >__asm mov PRODUCT_LOWER_WORD, eax\ >}\ >}
>There is no faster way to do it other than straight assembly >everwhere, and the difference in speed for cryptographic operations is >too great to ignore. A function call would ruin it. >_asm has its place.
On 16 Mar, 10:41, "Vladimir Vassilevsky" <antispam_bo...@hotmail.com>
wrote:

> Any sensible compiler will produce essentially the same code as the inline > assembly macro below, without clobbering the registers and unnecessary > load/store operations.
Maybe, but it seem the Rabbit's point is that the function calls involved in non-macro code dominates the run-time:
>> There are some cases where the overhead of a function call is simply >> intolerable.
Just out of curiosity, how would templates perform if you used C++ for these kinds of things? Rune