Hi,all
What do you think of about the c compiler of the DSPs you are
using?
Sometimes the codes works inefficiently, I don't know the reason, my
poor codes or poor c compiler.
I have reviewed the codes and found many instructions are used to
compute the addresses. How can we
evaluate the c compilers that the chip vendors provide? BDTI only
benchmarks the core.
Thanks
Jogging
performance of C compiler
Started by ●November 6, 2007
Reply by ●November 6, 20072007-11-06
joggingsong@gmail.com wrote:> Hi,all > What do you think of about the c compiler of the DSPs you are > using? > Sometimes the codes works inefficiently, I don't know the reason, my > poor codes or poor c compiler. > I have reviewed the codes and found many instructions are used to > compute the addresses. How can we > evaluate the c compilers that the chip vendors provide? BDTI only > benchmarks the core. > > Thanks > Jogging >Well, you can always do your own benchmarks of a few C compilers. That seems to be a popular method of comparing compilers/processors. In order to a do a good job (i.e. fair comparison) you will need to really make sure you know what you're doing. A couple things off the top of my head that you should make sure are the same: * optimization level (full on) * similar memory placement (i.e. both external or both internal) Of course you probably won't be able to extract maximum value from the compilers during some simple benchmarking without doing some significant work. Other things you might want to do: * Look into intrinsics to get C-access to special purpose instructions * Pragmas to give the compiler more info on alignments, loop unrolling, etc. * Use of vendor provided optimized DSP libraries/routines If you're looking for the compiler to give you best performance for "natural C" then you could skip the latter steps. However, in an embedded environment you traditionally need to do that extra work or else pay a lot more money for a beefier device. This relates back to your post a couple months ago regarding memory architecture. The ultimate performance will be dictated not only by the instructions spewed out by the C compiler, but also based on the memory architecture of the device (i.e. how quickly can it get code/data). Brad
Reply by ●November 6, 20072007-11-06
<joggingsong@gmail.com> wrote in message news:1194337277.137514.256010@e9g2000prf.googlegroups.com...> Hi,all > What do you think of about the c compiler of the DSPs you are > using? > Sometimes the codes works inefficiently, I don't know the reason, my > poor codes or poor c compiler. > I have reviewed the codes and found many instructions are used to > compute the addresses. How can we > evaluate the c compilers that the chip vendors provide? BDTI only > benchmarks the core.Do not look at the code produced by a compiler. There is no point. It gives you nothing but frustration. It is absolutely unimportant if one compiler is better then the other by 5% speed. If you will need more speed, you can always redo the critical parts in assembly. What is important is how many bugs are there in the compiler and its libraries. Hunting the compiler bugs can take a lot of time and effort. Vladimir Vassilevsky DSP and Mixed Signal Consultant www.abvolt.com
Reply by ●November 6, 20072007-11-06
Brad Griffis wrote:> joggingsong@gmail.com wrote: >> Hi,all >> What do you think of about the c compiler of the DSPs you are >> using? >> Sometimes the codes works inefficiently, I don't know the reason, my >> poor codes or poor c compiler. >> I have reviewed the codes and found many instructions are used to >> compute the addresses. How can we >> evaluate the c compilers that the chip vendors provide? BDTI only >> benchmarks the core. >> >> Thanks >> Jogging >> > > Well, you can always do your own benchmarks of a few C compilers. That > seems to be a popular method of comparing compilers/processors. In > order to a do a good job (i.e. fair comparison) you will need to really > make sure you know what you're doing. A couple things off the top of my > head that you should make sure are the same: > > * optimization level (full on) > * similar memory placement (i.e. both external or both internal) > > Of course you probably won't be able to extract maximum value from the > compilers during some simple benchmarking without doing some significant > work. Other things you might want to do: > > * Look into intrinsics to get C-access to special purpose instructions > * Pragmas to give the compiler more info on alignments, loop unrolling, > etc. > * Use of vendor provided optimized DSP libraries/routines > > If you're looking for the compiler to give you best performance for > "natural C" then you could skip the latter steps. However, in an > embedded environment you traditionally need to do that extra work or > else pay a lot more money for a beefier device. > > This relates back to your post a couple months ago regarding memory > architecture. The ultimate performance will be dictated not only by the > instructions spewed out by the C compiler, but also based on the memory > architecture of the device (i.e. how quickly can it get code/data). > > BradI just had one more thought on this -- the "efficiency" of the C compiler is also going to be heavily influenced by the underlying instruction set and memory architecture. If you're doing benchmarking then you are not only testing out the compiler, but you're also testing out the instruction set and the memory architecture. I think Vladimir makes a great point to avoid buggy compilers! Perhaps the corollary would be to make sure you download the latest bug-fix release for a given compiler! Brad
Reply by ●November 6, 20072007-11-06
On Nov 6, 2:21 am, joggings...@gmail.com wrote:> Hi,all > What do you think of about the c compiler of the DSPs you are > using? > Sometimes the codes works inefficiently, I don't know the reason, my > poor codes or poor c compiler. > I have reviewed the codes and found many instructions are used to > compute the addresses. How can we > evaluate the c compilers that the chip vendors provide? BDTI only > benchmarks the core. > > Thanks > JoggingAs Vladimir states, you can always optimize in assembly. You should architect your code so that the portions that may be performance bottlenecks can be readily optimized in assembly if required. I always write code in the highest-level language (e.g. C or C++) possible and optimize only when necessary; this is much more productive than trying to write the tightest code possible from the beginning. Darol Klawetter
Reply by ●November 6, 20072007-11-06
Vladimir Vassilevsky wrote:> <joggingsong@gmail.com> wrote in message >>I have reviewed the codes and found many instructions are used to >>compute the addresses. How can we >>evaluate the c compilers that the chip vendors provide? BDTI only >>benchmarks the core. > > Do not look at the code produced by a compiler. There is no point. It gives > you nothing but frustration. It is absolutely unimportant if one compiler is > better then the other by 5% speed.I do look at the assembly code - to see that I could hardly improve it by hand (and learn a new instruction once in a while). For DSP code, I use VisualDSP++ for Blackfin, which is, for this kind of code, really good in my opinion, and a magnitude ahead of the competition. You need to be nice to the compiler, though. For Blackfins: Modulo addressing works best with unsigned indexes, loop unrolling works better when you use 'restrict', etc. Most important, use compiler builtins for fract multiplication instead of multiply-and-shift, because compilers unfortunately are not yet smart to figure this out themselves. For different compilers, the required level of niceness will surely differ. So far, I had to resort to assembly in DSP code just once, to implement some rather funky kind of loop unrolling. Stefan
Reply by ●November 6, 20072007-11-06
Stefan Reuther <stefan.news@arcor.de> writes:> [...] > I do look at the assembly code - to see that I could hardly improve it > by hand (and learn a new instruction once in a while).Deciding that the code coming out of the compiler can't be further optimized is like deciding there's nothing you can do to improve a cooked meal. Most of the significant opportunities for optimization are in the process of deciding an implementation (e.g., how to group data), before the code is already laid down. -- % Randy Yates % "Bird, on the wing, %% Fuquay-Varina, NC % goes floating by %%% 919-577-9882 % but there's a teardrop in his eye..." %%%% <yates@ieee.org> % 'One Summer Dream', *Face The Music*, ELO http://www.digitalsignallabs.com
Reply by ●November 6, 20072007-11-06
Stefan Reuther wrote:> Vladimir Vassilevsky wrote: > >><joggingsong@gmail.com> wrote in message >> >>>I have reviewed the codes and found many instructions are used to >>>compute the addresses. How can we >>>evaluate the c compilers that the chip vendors provide? BDTI only >>>benchmarks the core. >> >>Do not look at the code produced by a compiler. There is no point. It gives >>you nothing but frustration. It is absolutely unimportant if one compiler is >>better then the other by 5% speed. > > > I do look at the assembly code - to see that I could hardly improve it > by hand (and learn a new instruction once in a while).OK, you can look through the code, if it makes you feel better. There is no other point to do that :-)> For DSP code, I > use VisualDSP++ for Blackfin, which is, for this kind of code, really > good in my opinion, and a magnitude ahead of the competition.VDSP produces a fair code which is neither significantly better no worse then the code produced by CCS from TI. It is prone to the typical inefficiencies of the compiler generated code, so it can be manually rewritten with the imrovement of the average factor of ~two. Neither VDSP is bug free; I have encountered some very unpleasant bugs with the optimizer. My impression is VDSP is a typical compiler, there is really nothing outstanding about it.> You need to be nice to the compiler, though.The compiler is for the man, not the man for the compiler.> For Blackfins: Modulo > addressing works best with unsigned indexes, loop unrolling works better > when you use 'restrict', etc. Most important, use compiler builtins for > fract multiplication instead of multiply-and-shift, because compilers > unfortunately are not yet smart to figure this out themselves. For > different compilers, the required level of niceness will surely differ.The whole point of using the compiler is avoiding the flea hunt like that. The C++ code should be nice, portable and easy to modify.> So far, I had to resort to assembly in DSP code just once, to implement > some rather funky kind of loop unrolling.After I rewrote the critical chunk of BF VDSP code in assembly, the throughput of the system was increased by the mere factor of 5. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
Reply by ●November 6, 20072007-11-06
Stefan Reuther wrote:> Vladimir Vassilevsky wrote: > >><joggingsong@gmail.com> wrote in message >> >>>I have reviewed the codes and found many instructions are used to >>>compute the addresses. How can we >>>evaluate the c compilers that the chip vendors provide? BDTI only >>>benchmarks the core. >> >>Do not look at the code produced by a compiler. There is no point. It gives >>you nothing but frustration. It is absolutely unimportant if one compiler is >>better then the other by 5% speed. > > > I do look at the assembly code - to see that I could hardly improve it > by hand (and learn a new instruction once in a while).OK, you can look through the code, if it makes you feel better. There is no other point to do that :-)> For DSP code, I > use VisualDSP++ for Blackfin, which is, for this kind of code, really > good in my opinion, and a magnitude ahead of the competition.VDSP produces a fair code which is neither significantly better no worse then the code produced by CCS from TI. It is prone to the typical inefficiencies of the compiler generated code, so it can be manually rewritten with the imrovement of the average factor of ~two. Neither VDSP is bug free; I have encountered some very unpleasant bugs with the optimizer. My impression is VDSP is a typical compiler, there is really nothing outstanding about it.> You need to be nice to the compiler, though.The compiler is for the man, not the man for the compiler.> For Blackfins: Modulo > addressing works best with unsigned indexes, loop unrolling works better > when you use 'restrict', etc. Most important, use compiler builtins for > fract multiplication instead of multiply-and-shift, because compilers > unfortunately are not yet smart to figure this out themselves. For > different compilers, the required level of niceness will surely differ.The whole point of using the compiler is avoiding the flea hunt like that. The C++ code should be nice, portable and easy to modify.> So far, I had to resort to assembly in DSP code just once, to implement > some rather funky kind of loop unrolling.After I rewrote the critical chunk of BF VDSP code in assembly, the throughput of the system was increased by the mere factor of 5. Vladimir Vassilevsky DSP and Mixed Signal Design Consultant http://www.abvolt.com
Reply by ●November 6, 20072007-11-06
Vladimir Vassilevsky wrote:> > > Stefan Reuther wrote: > >> Vladimir Vassilevsky wrote: >> >>> <joggingsong@gmail.com> wrote in message >>> >>>> I have reviewed the codes and found many instructions are used to >>>> compute the addresses. How can we >>>> evaluate the c compilers that the chip vendors provide? BDTI only >>>> benchmarks the core. >>> >>> Do not look at the code produced by a compiler. There is no point. It >>> gives >>> you nothing but frustration. It is absolutely unimportant if one >>> compiler is >>> better then the other by 5% speed. >> >> >> I do look at the assembly code - to see that I could hardly improve it >> by hand (and learn a new instruction once in a while). > > OK, you can look through the code, if it makes you feel better. There is > no other point to do that :-) > >> For DSP code, I >> use VisualDSP++ for Blackfin, which is, for this kind of code, really >> good in my opinion, and a magnitude ahead of the competition. > > VDSP produces a fair code which is neither significantly better no worse > then the code produced by CCS from TI. It is prone to the typical > inefficiencies of the compiler generated code, so it can be manually > rewritten with the imrovement of the average factor of ~two. Neither > VDSP is bug free; I have encountered some very unpleasant bugs with the > optimizer. My impression is VDSP is a typical compiler, there is really > nothing outstanding about it. > >> You need to be nice to the compiler, though. > > The compiler is for the man, not the man for the compiler. > >> For Blackfins: Modulo >> addressing works best with unsigned indexes, loop unrolling works better >> when you use 'restrict', etc. Most important, use compiler builtins for >> fract multiplication instead of multiply-and-shift, because compilers >> unfortunately are not yet smart to figure this out themselves. For >> different compilers, the required level of niceness will surely differ. > > The whole point of using the compiler is avoiding the flea hunt like > that. The C++ code should be nice, portable and easy to modify. > >> So far, I had to resort to assembly in DSP code just once, to implement >> some rather funky kind of loop unrolling. > > After I rewrote the critical chunk of BF VDSP code in assembly, the > throughput of the system was increased by the mere factor of 5.The biggest thing is not tweaking the code, but tweaking the use of memory. If you blindly let the compiler and linker do their work, performance can be poor. If you then make sure the key vectors are in internal memory, possibly by some additional vector copies to conserve the precious internal memory, the improvement can be huge. Steve






